Overview
zombied reconcile processes stale outbox rows and recovers runs stuck in non-terminal states. It is designed to be run as a periodic cron job or scheduled task.
What it does
The reconciler scans for:- Stale outbox rows — Transactional outbox entries that were written to PostgreSQL but never picked up or acknowledged. These can occur when a worker crashes between writing the outbox row and completing the downstream action (webhook delivery, status update).
- Stuck runs — Runs that remain in a non-terminal state (
QUEUED,RUNNING) beyond the expected timeout window. These can occur when a worker crashes mid-execution and no other worker reclaims the work.
- If the run’s timeout has elapsed, mark it as
FAILEDwith error codeUZ-EXEC-014(lease expired). - If the outbox row has a pending webhook delivery, retry the delivery.
- Record the reconciliation action in the run’s event log.
Usage
Scheduling
Run the reconciler on a regular schedule. Every 5 minutes is a reasonable default:Idempotency
The reconciler is idempotent. Running it multiple times on the same stale data produces the same result. It uses database-level locks to prevent concurrent reconciliation from conflicting.Monitoring
Watch thereconcile_runs_total and reconcile_errors_total metrics to track reconciliation activity. A sustained increase in reconciled runs may indicate worker instability.