Step 7
Data pipelines — retries · idempotency
25 min
Data pipelines — retries · idempotency
External APIs, batches, queues. Network and failures are daily life. Idempotent design makes retries safe.
1. The exactly-once myth
Exactly-once is effectively impossible. The real pattern is at-least-once + idempotent consumers.
2. Natural-key idempotency
CREATE TABLE payments (
id BIGSERIAL PRIMARY KEY,
idempotency_key TEXT NOT NULL UNIQUE,
amount DECIMAL NOT NULL,
status VARCHAR NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
INSERT INTO payments (idempotency_key, amount, status)
VALUES ($1, $2, 'pending')
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING *;
UNIQUE + ON CONFLICT records once even if called twice.
3. State-transition idempotency
UPDATE orders SET status = 'paid', paid_at = now()
WHERE id = $1 AND status = 'pending';
Already paid → 0 rows changed. Safe to re-run.
4. Retry
async function withRetry<T>(fn: () => Promise<T>, max = 3): Promise<T> {
for (let i = 0; i < max; i++) {
try { return await fn(); }
catch (e) {
if (i === max - 1) throw e;
await new Promise(r => setTimeout(r, 2 ** i * 1000 + Math.random() * 500));
}
}
throw new Error("unreachable");
}
Exponential backoff + jitter.
5. Circuit breaker
let fails = 0, openedAt: number | null = null;
const THRESHOLD = 5, COOLDOWN = 30_000;
async function call<T>(fn: () => Promise<T>): Promise<T> {
if (openedAt && Date.now() - openedAt < COOLDOWN) throw new Error("circuit_open");
try {
const r = await fn(); fails = 0; openedAt = null; return r;
} catch (e) {
fails++; if (fails >= THRESHOLD) openedAt = Date.now(); throw e;
}
}
6. Outbox pattern
BEGIN;
UPDATE users SET verified = true WHERE id = $1;
INSERT INTO outbox_events (type, payload)
VALUES ('user.verified', jsonb_build_object('userId', $1));
COMMIT;
A worker polls outbox_events, publishes, then deletes. Guarantees write + publish atomicity.
7. Saga
Multi-service transactions with compensations.
order accepted → stock debit → payment → ship
↓ fail
stock restore (compensation)
Overkill for simple 2-step flows.
8. Dead-letter queue
After 3 retries, park in DLQ for manual inspection.
9. Idempotent batches
last = db.fetchval("SELECT MAX(id) FROM events WHERE processed")
rows = db.fetch("SELECT * FROM events WHERE id > $1 ORDER BY id LIMIT 1000", last)
for r in rows:
process(r)
db.execute("UPDATE events SET processed = true WHERE id = $1", r.id)
10. Gotchas
- No idempotency_key → double charges
- Retry without backoff → thundering herd
- Publishing outside the transaction → events after rollback
- Assuming at-most-once → dropped events
11. Observability
Dashboards for retry rate, success rate, DLQ size, p95/p99 latency.
Closing
"Retryable code = idempotent code". Most distributed-system headaches disappear under that rule.
Next
- 08-backup-restore