Set up a bring-your-own-key model provider without leaving the Models page
The Models page is now a guided, two-step wizard for bring-your-own-key (BYOK) setup. Choosing “use my own provider key” with an empty vault used to dead-end at a disabled button that sent you off to Credentials; now you paste an API key, the provider and a default model fill themselves in from the key’s prefix, you save the credential inline, pick the model, and activate — all on one screen. Platform-managed keys stay a one-click choice. A short note explains that switching providers applies to new runs while in-flight agents finish on their current one.Behind the wizard, the public model catalogue moved: the unauthenticated_um/<key>/model-caps.json document is now cap.json, and it carries the global run and event rates, the starter credit, and the free-trial window alongside the per-model catalogue — one document for the client config that needs no auth.Upgrading
_um/<key>/model-caps.jsonis now_um/<key>/cap.json. The old path returns404(no alias). The per-modelmodels[]shape is unchanged, so if you read the public catalogue directly, only the path moves — switch tocap.json. If you only use the dashboard orzombiectl, there is nothing to do; they handle this for you.
What’s new
- Inline credential create on the Models page — a structured provider / API key / model form that names the credential after the provider and detects the provider from the key prefix. No trip to Credentials.
- Catalogue-backed model picker — the model field is populated from
cap.json, with free-text entry as a fallback for a model not in the catalogue. - Global config in
cap.json— the document now includes aratesblock (run and event rates) and abillingblock (starter credit, free-trial window), so a client no longer hardcodes those constants.
API reference
GET /_um/<key>/cap.json— unauthenticated; replacesmodel-caps.json. Returns{ version, models, rates, billing }. Eachmodels[]row carriesid,provider,context_cap_tokens, and the per-model token rates (unchanged);ratescarriesrun_nanos_per_secandevent_nanos;billingcarriesstarter_credit_nanos,free_trial_end_ms, andfree_trial_stage_nanos. The optional?model=filter narrowsmodels[]; the global blocks are always present. A wrong key returns404; an empty catalogue returns503.
Pin an agent to capable runners with tags
An agent’stags: now decide where it runs. Add tags: [gpu, us-east] to your SKILL.md frontmatter and the agent runs only on a runner whose advertised labels cover every tag — capability-bound work lands on a host that can serve it. An agent with no tags runs on any runner, exactly as before. The tags you already write in the manifest drive this; there is no separate field to set.If no enrolled runner currently advertises the required labels, the agent waits for a matching runner rather than running on an unsuitable host.A waiting agent surfaces no separate status today. If an agent never starts, confirm an enrolled runner advertises every tag the agent requires.
API reference
required_tagson an agent — derived from the SKILL.mdtags:on create and re-derived when youPATCHa newsource_markdown. A runner claims an agent only whenrequired_tagsis a subset of the runner’s labels. Each tag is 1–64 characters; an agent carries at most 32. Either bound exceeded rejects the request withUZ-REQ-001.
Enroll runners from the dashboard, plus a regrouped left navigation
You can now add a runner to your fleet from the dashboard instead of the command line. A platform admin opens Runners, clicks Add runner, and copies a one-time runner token to install on the host — your platform-admin login never touches a shell. The runner list shows each host’s liveness honestly: a host you just enrolled reads registered until it first checks in, instead of a false online.The dashboard’s left navigation is regrouped into Operations (Agents, Approvals, Events), Configuration (Credentials, Model, and Runners for platform admins), and Organization (Settings, Billing). The model-and-provider settings move out of Settings to their own Model entry at/settings/models.Upgrading
zombie-runner registeris removed. The runner CLI no longer accepts--token,--host-id, orZOMBIE_TOKEN. To enroll a host, a platform admin mints a runner token from the dashboard (Runners → Add runner) and installs it on the host asZOMBIE_RUNNER_TOKEN. Runners already enrolled keep working — only the enrollment step changed.- Server errors on non-idempotent requests are no longer retried. The dashboard and
zombiectlHTTP clients now retry a server 5xx only for idempotent methods (GET/PUT/DELETE/HEAD); a failedPOSTorPATCHsurfaces immediately instead of risking a duplicate write. Network errors andRetry-Afterresponses (429/503/504) retry as before.
What’s new
- Fleet list in the dashboard — platform admins see every enrolled runner with its derived liveness:
registered,online,busy, oroffline.
API reference
GET /v1/fleet/runners— platform-admin only; paginated. Each item carriesliveness ∈ {registered, online, busy, offline}and never includes the token hash. A tenant token gets403.
CLI
zombie-runner registeris gone. The subcommand and its--token/--host-idflags no longer exist; see Upgrading above for the dashboard enrollment flow.
One name across the product: the thing you install is an “agent”
The dashboard, the marketing site, and these docs now call the thing you install an agent, and define it the moment you meet the word: “An agent is a long-lived runtime you install once. It sleeps until an event wakes it, runs your skill against that event, and reports back with evidence.” The same concept used to appear aszombie, “the agent”, or “a runtime” in different places — it is one noun now. The brand stays usezombie, and the CLI (zombiectl), the routes, and the API fields are unchanged; only the words you read changed.What’s new
- The dashboard navigation, empty states, install buttons, and the stop/resume/kill dialogs all read “agent”.
- First-touch surfaces — the “What is usezombie?” FAQ, the dashboard first-run card, and the agents-list empty state — now carry the definition above.
zombied runs on a single database role — the worker datastore credentials retire
Following the runner-fleet split, the deleted worker process left behind a worker_runtime Postgres role, a worker Redis access-control list (ACL) user, and DATABASE_URL_WORKER / REDIS_URL_WORKER deploy variables that zombied still required at startup but never used to connect. zombied already writes every event-path row through its API database role, so the worker role and its variables are removed. The control plane now runs on a single api_runtime role for both reads and writes; a host-resident runner continues to hold no datastore credentials at all.Upgrading
- Drop
DATABASE_URL_WORKERandREDIS_URL_WORKERfrom thezombieddeployment. The server boots onDATABASE_URL_API+REDIS_URL_APIalone; the worker variables are no longer read. - Deploy order matters: ship the updated
zombiedfirst, then remove theworker_runtimePostgres role and theworkerRedis ACL user. The old binary still authenticates asworker_runtime, so dropping the role before the new build is live would cut its connection.
Usage-based billing — agent runs metered by the second
Agent run time is now billed by the second while an agent is actively running, at 0.36/hr), plus the model’s per-token cost on platform-key runs. This replaces the flat estimate taken when work started: a long run is charged for the time it actually used, a short one is not over-charged, and an agent that is not running is not billed. The run rate is the same whether you bring your own model key or use a platform key — bring-your-own-key runs record token usage but are charged for run time only.Credit drains as the run proceeds: each background lease renewal meters the elapsed slice plus any new token usage, so a multi-minute run bills continuously instead of in one lump at the end. A run that exhausts its balance stops at the next renewal rather than going negative.API reference
GET /v1/tenants/me/billing/charges/{event_id}/metering-periods— the per-slice breakdown behind a single run charge: one row per renewal plus the final settle, each carrying the run milliseconds, token deltas, run fee, token cost, and the amount actually charged. Scoped to the calling tenant; a foreignevent_idreturns no rows.
Host skills move to their own repo — install with npx skills add usezombie/skills
The /usezombie-* host-agent skills now live in the public usezombie/skills repository and install with npx skills add usezombie/skills. The npm package ships the zombiectl CLI and agent samples only; it no longer bundles the skills. Installing the CLI and adding the skills are now two commands. The curl -fsSL https://usezombie.sh | bash one-shot installer does both for you and is unchanged.What’s new
npx skills add usezombie/skills— the install path onindex.mdx,quickstart.mdx,cli/install.mdx, andzombies/install.mdx. Pass--host=<claude|amp|codex|opencode>to pin a specific host.- Skills iterate independently. A skill update ships from the skills repo without a CLI release, so skill fixes reach you without waiting on an npm publish.
npm install -g @usezombie/zombiectlinstalls the CLI and samples only. Older instructions that said the npm install also set up the skills no longer apply — add them with the command above.
Long single events keep running — lease renewal lands
The follow-up promised in the May 27, 2026 runner-fleet release is here: an event that runs longer than the lease window is no longer reclaimed and replayed. While a runner is executing an event, it renews the lease in the background, pushing the kill deadline forward so a single long-running agent runs to completion instead of being cut off after 30 seconds. A hard 12-hour ceiling still caps a runaway event.What’s new
- Background lease renewal. A runner renews mid-execution and keeps ownership of the event while it is still working. A runner that loses its lease cannot reclaim it.
- Fail-safe renewal. A renewal that hits a transient database fault is retried on the next tick rather than ending the event; only a genuinely missing or superseded lease stops it, so a long event survives a brief datastore blip.
- Bounded runtime. Renewal cannot extend a single event past a 12-hour maximum, so a stuck agent is still reclaimed.
Execution moves to a host-resident runner fleet
Behind an unchanged user surface, an event’s agent now runs in a separatezombie-runner daemon instead of inside the API server. zombied became the control plane — it owns Postgres, Redis, and the Vault, and hands work to runners over an authenticated HTTPS protocol — while the runner is the execution plane that runs each event in an isolated sandbox holding no datastore credentials. Steering, webhooks, cron, the live event tail, and history behave exactly as before; what changed is where the work runs.What’s new
- Control plane / execution plane split. A runner leases an event, runs it, and reports the result;
zombieddoes the durable writes. Work can run on hosts that never see a database credential. - Lease-based ownership. Each lease carries a deadline. A runner that dies mid-event has its work reclaimed and re-run by another runner; a late report from the dead runner is rejected, so state is never double-written.
- Sandbox is mandatory. Every event runs in a sandbox; a sandbox that fails to start fails closed rather than running unprotected.
Host-resident runners are not enabled in production yet — this release lands the architecture; turning them on follows in a later update.
One known limit: an agent that runs longer than the 30-second lease window is reclaimed and re-run, so long single events wait on a follow-up that adds lease renewal.
zombiectl login — verification-code device flow + non-interactive token auth
Logging in opens a browser approval page that shows a 6-digit verification code; you type it back into the terminal to finish. The code binds the browser approver to the terminal that started the flow, so phishing the approval URL alone can’t mint a credential. CI and scripts can skip the browser entirely and supply a token directly.What’s new
- Verification-code device flow.
zombiectl loginopens the approval page; after you Approve, the page shows a 6-digit code to type back into the terminal. The 6-digit shape is checked locally before it’s sent, so a typo just re-prompts; wrong codes cap at 5 per session. - Non-interactive auth. Supply a token without the browser via
--token <token>, theZOMBIE_TOKENenvironment variable, or by piping it on stdin — resolved in that order. Prefer the env var or stdin to keep the token out of shell history. A non-interactive shell with no token exits with an error instead of waiting. - Device labels.
--token-name <label>names the session on the approval page and inzombiectl auth status; it defaults to your platform family (macos-cli,linux-cli, …). - One auth-token env var.
ZOMBIE_TOKENis the single environment variable the CLI reads for a user token, everywhere a token is resolved.
Create and manage API keys from the dashboard
Tenant API keys (zmb_t_…) — the credentials that authenticate service-to-service callers like CI, cron, or an integration — can now be created, viewed, revoked, and deleted from Settings → API keys, with no curl required. Creating a key reveals the raw value exactly once in a copy-and-store panel; closing the dialog discards it for good. The same release adds a light/dark theme toggle and a themed fallback avatar.What’s new
- API keys in the dashboard — mint a key with a name and optional description, see each key’s status (active or revoked) with created and last-used timestamps, and revoke or delete it from the list. Available to operator-role users and above; others are redirected to Settings.
- One-time key reveal — the raw
zmb_t_…value appears once at creation behind a copy button and is never rendered again. A key can be deleted only after it has been revoked, so the audit trail stays intact. - Light / dark theme toggle — switch the dashboard theme from the header; the choice persists across reloads and renders correctly on first paint, with no flash of the wrong theme.
- Themed avatar fallback — with no profile photo set, your initials avatar now uses the dashboard palette instead of a stock fill.
zombiectl steer opens a terminal prompt when no message is supplied
zombiectl steer <zombie_id> now opens a read-eval-print loop (REPL) when you invoke it from a terminal (TTY) without a message. Automation stays single-shot: explicit messages and piped input still post one steer message, stream one reply, and exit.CLI
- Terminal prompt —
zombiectl steer <zombie_id>now prompts for one steer message at a time, streams the reply, then prompts again.Ctrl-Dexits cleanly;Ctrl-Ccancels the active stream and exits. - Single-shot automation —
zombiectl steer <zombie_id> "message"andecho "message" | zombiectl steer <zombie_id>keep the existing one-turn behavior for scripts and agents. - Forced prompt mode —
--ttyforces the REPL even when stdin is piped, then exits when the pipe reaches end-of-file.
The dashboard no longer puts an access token in the page source
An agent’s detail page used to hand its live-activity panel a short-lived API token as a component property, which placed the raw token in the page’s HTML source — readable by anyone with access to the open tab (a shared screen, a cached page, a browser extension) without any exploit. The dashboard now hands no token to the browser at all: steering an agent goes through a server-side action, and the activity stream loads its recent history on the server. This completes the dashboard token-handling work that began on May 19, 2026.What’s new
- Steer failures are visible — when a steer message can’t be delivered it shows a
failedbadge instead of staying onqueued; a delivered steer reconciles to its live event as before. - Quieter retries — a failed steer is retried on the server automatically, so the dashboard no longer shows a per-attempt retry counter while steering or while loading recent activity.
Create a workspace inline, plus a readable sign-in and one loading mark
The workspace switcher can now create a workspace without leaving the dashboard, and a sign-in-onward pass fixed the sign-in input contrast, the install-toast fade, and a stray development route.What’s new
- Inline workspace creation — a
New workspaceitem in the switcher opens a dialog, callsPOST /v1/workspaces, and switches you onto the new workspace on success; a blank name lets the server generate one. The switcher now renders even when you have zero workspaces. - One loading mark — loaders that used a generic spinning icon now use the brand wake-pulse: a label beside the dot for page-level waits, the dot alone inside buttons.
Bug fixes
- Sign-in input contrast — the Clerk sign-in and sign-up fields rendered the same color as the card behind them, so the box only showed on focus; they now sit on an inset surface with a stronger border.
- Install-toast fade — the Hero install confirmation cleared its text the same frame it began to fade, so the 240 ms fade ran on a blank line; the message and a warning toast’s color now hold through the window.
- Stray development route — the
/ds-button-rsccomponent-preview page is removed from the build.
Dashboard tightens its token-handling posture
Authenticated reads from the dashboard no longer carry the API bearer token in browser memory. Every request from a client component routes through a same-origin proxy that mints the JWT server-side on each hop; the browser bundle never sees the raw value. The token type also moves to an opaque wrapper that masks itself in console logs, JSON serialization, and React Server Component prop boundaries — accidental leaks viaconsole.log(token) or error-tracker capture print <redacted> instead of the raw bearer.No user-facing behavior change. The retry layer’s per-attempt timeout also stops leaking pending setTimeout handles when the request wins the race.zombiectl telemetry — Supabase-aligned env vars, on by default, agent attribution
zombiectl now follows the same telemetry contract as the Supabase CLI: anonymous usage data is on by default, three env-var knobs control opt-out and host overrides, and every event carries the detected AI agent (Claude Code, Cursor, Cline, etc.) when one is wrapping the invocation. The opt-out names are public, documented in --help, and respect the industry-standard DO_NOT_TRACK signal. Agent attribution helps prioritise CLI ergonomics for the hosts where most usezombie traffic actually originates.Upgrading
The opt-out env vars renamed and the default flipped. If you previously setDISABLE_TELEMETRY=0 to opt in, you can remove that line — telemetry is on by default now. To stay opted out, switch to one of the new names:DISABLE_TELEMETRY→ZOMBIE_TELEMETRY_DISABLED(set to1to opt out; any other value is treated as unset)ZOMBIE_POSTHOG_KEY→ZOMBIE_TELEMETRY_POSTHOG_KEYZOMBIE_POSTHOG_HOST→ZOMBIE_TELEMETRY_POSTHOG_HOSTDO_NOT_TRACK=1is honored unchanged (industry-standard signal)ZOMBIE_TELEMETRY_DEBUG=1for local span debug output (unchanged)
What’s new
- On by default. Fresh installs bootstrap
$ZOMBIE_STATE_DIR/telemetry.json(default~/.config/zombiectl/telemetry.json) withconsent: "granted"on the first invocation. No interactive prompt — the contract is the--helpenv-var section + the Configuration → Telemetry docs page. - AI agent attribution. Every event now carries an
ai_toolproperty when an agent is wrapping the CLI (Claude Code, Cursor, Cline, Aider, Continue, Windsurf, Copilot, Replit, etc.). Detected via@vercel/detect-agent. Unknown non-interactive contexts fall back tounknown_non_interactive; CI contexts toci. - Two opt-out signals, one persisted state.
ZOMBIE_TELEMETRY_DISABLED=1orDO_NOT_TRACK=1both force consent todeniedregardless of the persisted file. Either works; pick the one your team’s tooling already standardises on.
CLI
zombiectl --helpnow advertises all five telemetry env vars under “Environment variables” so the opt-out path is discoverable without leaving the terminal.ZOMBIE_STATE_DIRis now documented as the override for local CLI state, including credentials, telemetry consent, and session files. Default remains~/.config/zombiectl.
Trigger panel multi-card + website OnboardingFlow + Hero CTA
The dashboard’s agent-detail page now renders one card per declared trigger — a guided registration card with renderedgh api / linear webhook create / jira webhook add snippets for known providers (GitHub, Linear, Jira, Grafana, Slack, AgentMail), a copy-URL fallback for unknown sources, and a CronCard showing the schedule + next-fire computed client-side from the host’s IANA timezone. The website Home page replaces the FeatureFlow evidence section with a 4-step OnboardingFlow pictorial (install → add skill → wire webhook → steer), and the Hero primary CTA becomes a terminal-style $ npm install -g … button that copies the install command, surfaces an inline <output aria-live> toast, and smooth-scrolls to the #onboarding-flow anchor (honoring prefers-reduced-motion).TriggerPanel.tsxswitches from Tabs to Radix Accordion. OneAccordionItemperzombie.triggers[]entry; the first trigger with no recorded delivery auto-expands on mount (set-up state takes precedence). Inner router dispatcheswebhooktoGuidedTriggerCard(known provider) orCopyUrlFallback(unknown),crontoCronCard,apitoCopyUrlFallbackwith the legacy bare webhook URL.provider-guidance.tsships six provider entries —github,linear,jira,grafana,slack,agentmail. Each exposes a 3-argcommand(vars, webhookUrl, events)template so per-provider snippets vary bytrigger.events, awebUiDeepLink(vars)for the “Open<provider>” affordance, and avariableslist driving the variable-input form. Snapshot-tested per provider.webhookUrlFor(zombieId, source?)— dashboard helper reconstructs the server’swebhook_urls: { <source>: <url> }projection client-side. Called with a source, returns the per-trigger URL; called without, returns the legacy bare ingress at/v1/webhooks/{zombie_id}.OnboardingFlow.tsxreplacesFeatureFlow.tsx. Four numbered cards with arrow connectors onlg:breakpoint; mobile stacks vertically. Each card carries a real shell snippet, not a screenshot. Anchored atid="onboarding-flow"for Hero’s smooth-scroll target.Hero.tsxprimary CTA is now a<button>— copiesnpm install -g @usezombie/zombiectl && npx skills add usezombie/usezombietonavigator.clipboard, shows the toast for 2 s, scrolls to#onboarding-flow. Clipboard-blocked path surfaces a “Clipboard blocked — select the command above and copy manually” toast.<output aria-live="polite">element used in place of a design-system Toast primitive (there isn’t one yet).cron-parser@^5.5.0added toui/packages/appfor theCronCardnext-fire computation. Landing-js bundle stays at 132.6 kB gz under the 140 kB.size-limit.jsonceiling (7.4 kB headroom).- Coverage gate — app package thresholds (statements 95 / branches 90 / functions 95 / lines 95) all green with three new targeted tests covering the
setCopiedKeyreset updater both branches, theIntltimezone fallback to"UTC", and the cron / api auto-expandTriggerBodypaths. - Timer-leak fixes —
GuidedTriggerCardcopy-reset,CopyUrlFallbackcopy-reset, and the existingHerotoast all now store theirsetTimeouthandle in auseRefand clear it in auseEffectdestructor on unmount; unmount-cleanup tests added at each call site. - Landing-page promo pill — small mono pill between the LIVE eyebrow and the headline links to
/pricingand surfaces the “Free until July 31, 2026” free-trial posture that already lived on the pricing component but was invisible above the fold. Click emitstrackNavigationClicked({ source: "hero_promo_pill", target: "pricing" }).
TRIGGER.md is still the source of truth; the dashboard cards are read-only. Edit the markdown and reinstall to change triggers.Docs sweep — npx skills add install path, full doctor table, tenant-provider commands
Cold-start install no longer routes through mkdir + curl. The host-agent skill ships via npx skills add usezombie/usezombie (one-liner — symlinks /usezombie-* into your host’s skill directory); the curl path stays as a fallback for environments without a node toolchain. The reconciliation pass also fixes a handful of doc claims that had drifted past what zombiectl actually returns and adds the tenant provider subcommands that were already shipped but undocumented.npx skills add usezombie/usezombie— primary install path onindex.mdx,quickstart.mdx,cli/install.mdx,zombies/install.mdx. Pass--host=<claude|amp|codex|opencode>to pin a specific host.https://usezombie.sh/skills.mdremains as the curl fallback.zombiectl doctortable now lists five checks (was three) —auth_token_presentandtenant_providerwere already returned by the CLI but missing fromcli/install.mdx. Thetenant_providerrow carries{ mode, model, context_cap_tokens, free_trial }so the launch trial window surfaces in the same place users check before installing.zombiectl tenant provider show / set / reset— added tocli/zombiectl.mdx. Flips the tenant between platform-managed inference (default) and a self-managed provider credential from the workspace vault. Architecture canon for the flow lives indocs/architecture/user_flow.md§8.7 andbilling_and_provider_keys.md; the CLI page now points readers there instead of recreating the narrative.concepts.mdx— “Skill” card replaced with a “Tool” card that names the binding distinction (TRIGGER.mdis sandbox-enforced;SKILL.mdis advisory prose). Trigger accordion’s webhook URL updated to/v1/webhooks/{zombie_id}/{source}to match what M43 / M68 shipped.zombies/overview.mdxlifecycle — mermaid expanded from three states to five (install→Alive↔Stopped,Alive→Killed→Deleted) sostop/resumeshow up in the diagram users find first.
Redis pool — lifts the single-mutex throughput cap
Previously, every server-side Redis command serialised through a single mutex, hard-capping per-process throughput at roughly 40 ops/sec/connection regardless of how many producers were issuing writes. The mutex is gone; concurrent producers now each take their own pooled connection. Long-lived blocking consumers — the per-agent worker stream reader, the watcher’s control-stream reader, the dashboard’s activity subscriber — hold dedicated connections outside the pool by design (pooling those would exhaust the pool at the 9th agent).- Pool sizing knobs (env vars) —
REDIS_POOL_MAX_IDLE(default 8) caps the idle pool size;REDIS_POOL_EAGER_MIN(default 2) preconnects on boot so the first burst doesn’t pay dial latency. - Request-path timeout —
REDIS_REQUEST_TIMEOUT_MS(default 5000 ms). A command that doesn’t return within the budget surfaces as a transport timeout and closes the connection; the next acquire dials fresh. - Boot-time validation — a misparsed
REDIS_REQUEST_TIMEOUT_MSvalue fails the process boot with error codeUZ-STARTUP-ENV-CHECKinstead of silently falling back to a default. - Server error replies surface in logs —
READONLY,BUSYGROUP,WRONGTYPE(and other Redis-side error frames) now emit aredis_command_err_replywarn line carrying the server’s text before being mapped to the internal command error. Underlying cause is visible in operator logs, not flattened into a generic operation failure. - 8 new Prometheus pool series under the
zombie_redis_pool_*namespace —dials_total,overflow_dials_total,poisoned_connections_total,reconnects_total,forced_closes_total,acquire_timeouts_total,idle,acquire_wait_ns_p99. The last two are stubs today:acquire_timeouts_totalreports 0 (no acquire-timeout path yet —Pool.acquirenever blocks), andacquire_wait_ns_p99reports 0 (the per-acquire wait histogram wires in alongside the timeout-aware acquire). Both stay in the series list so dashboards don’t break when the wiring lands. - Pub/sub subscriber consolidation — one subscriber type with a configurable read-timeout; production passes none (block indefinitely), test harnesses pass a budget.
Trigger DX overhaul — gh-driven webhook registration, free-trial pricing, dashboard chat
Installing the platform-ops agent no longer ends with a paste-into-GitHub step. The host-agent install skill (/usezombie-install-platform-ops) registers each declared webhook via your own gh CLI, HMAC-self-verifies the registration, and reports the result inline. Through end of July 2026, every customer-visible rate string reads “Try for free” — the stage-execution rate is set to zero nanos, and the website’s pricing component renders the historical rate with a strike-through plus a “Free until July 2026” banner. The dashboard’s /zombies/{zombie_id} page replaces the bespoke “Live activity” panel with a chat surface, so you can steer an agent from the same screen that shows its events.Upgrading
-
TRIGGER.mdtrigger:→triggers:(array). An agent declares 1–8 trigger entries undertriggers:. The singulartrigger:shape is rejected at install withERR_ZOMBIE_INVALID_CONFIG: use "triggers:" (array)— no compat shim, no rewrite-on-load. Convert everyTRIGGER.mdyou maintain:Entries are unique on(type, source); at most onecronentry per agent. -
Install response shape —
webhook_url→webhook_urls.POST /v1/workspaces/{workspace_id}/zombies201 returnswebhook_urls: { <source>: <url> }keyed bytriggers[].source. The oldwebhook_urlscalar field is gone. The CLI’s--jsoninstall output emits the same map; the human-readable output prints one URL per declared webhook trigger. -
Webhook URLs are source-suffixed. External systems POST to
https://api.usezombie.com/v1/webhooks/{zombie_id}/{source}— the{source}segment matchestriggers[].sourceinTRIGGER.md. Update any hand-maintained upstream hooks. -
Mission Control→Dashboard. Every “Mission Control” string in the website, app, and CLI output is now “Dashboard”. Bookmarks unaffected; only display copy changed.
What’s new
zombiectl auth status— inspect the stored credential without re-authenticating. Resolves the token source (file vsZOMBIE_TOKENenv vs none), decodes JWT claims (iss,aud,sub,metadata.tenant_id,metadata.role,exp), and probesGET /v1/tenants/me/billingto verify the token is still accepted. Exit0on valid or unreachable (transient API issues don’t poison local state); exit1onUZ-AUTH-001/UZ-AUTH-002/TOKEN_EXPIRED.- Trigger panel goes multi-card. Each declared trigger renders its own card on the dashboard’s agent-detail page — a guided card with the upstream registration command for known providers (GitHub, Linear, Jira, Grafana, Slack, AgentMail, Clerk), a copy-URL card for unknown sources, a schedule + next-fire card for
cron, and a catch-all forapi. - Dashboard chat surface. The
/zombies/{zombie_id}page mounts an@assistant-ui/reactthread in place of the priorLiveEventsPanel. Webhook / cron / continuation events render as system chips; the agent’s reasoning streams as assistant bubbles; the composer at the bottom turns user input into a steer (POST /v1/workspaces/{workspace_id}/zombies/{zombie_id}/messages— existing endpoint). - Install skill is platform-neutral.
/usezombie-install-platform-opsruns in any host with anAskUserQuestion-equivalent — Claude Code, Amp, Codex CLI, OpenCode all drive the same skill body. The skill loopsgh api repos/<owner>/<repo>/hooksper declared webhook trigger and substitutes the workspacegithubcredential’swebhook_secretinto each registration. Re-running on a repo with an existing hook at the same URL is idempotent (matched onconfig.url, advanced).
API reference
POST /v1/workspaces/{workspace_id}/zombies (201):GET /v1/workspaces/{workspace_id}/zombies list rows gain a triggers projection:GET /v1/workspaces/{workspace_id}/zombies/{zombie_id}/events accepts a new actor_prefix query parameter (e.g. actor_prefix=webhook:) for server-side filtering by event source. No client-side fallback.CLI
zombiectl install --from <path>prints a per-trigger URL block. Output switches from a singlewebhook_url:line toWebhook URLs (register on the upstream provider):followed by one<source>: <url>line per declared webhook trigger. The--jsonshape returnswebhook_urlsas a{ <source>: <url> }map.zombiectl auth status— new subcommand documented above.
Redis request path: concurrent by default, configurable, observable
Every short-lived Redis command (XADD, PUBLISH, XACK from HTTP handlers and per-step worker publishes) now flows through a connection pool instead of serializing behind a single client-wide mutex. Concurrent requests from worker threads and HTTP handlers each grab their own pooled connection, complete their round-trip without contending, and release. The prior ~40 ops/sec-per-connection ceiling that didn’t scale with CPU cores is gone.What’s new
REDIS_POOL_MAX_IDLE(default8) — maximum connections held in the idle pool per process. Operators raise it for sustained high-concurrency workloads to reduce overflow dials; the request-path completes in single-digit ms even over Upstash TLS, so> 16is unusual.REDIS_POOL_EAGER_MIN(default2) — connections pre-warmed at boot. Covers cold-boot latency (Upstash TLS handshake is tens of ms per dial); raise if boot is followed by an immediate burst of short-lived commands.REDIS_REQUEST_TIMEOUT_MS(default5000) — request-path read timeout. A frozen Upstash proxy can no longer pin a worker thread indefinitely. Don’t raise above 5000 — Upstash regional p99 is single-digit-ms; >5s is failure, not slowness.- Long-lived blocking consumers stay on dedicated connections. The watcher’s
XREADGROUPonzombie:control, per-agent workers’XREADGROUPonzombie:{id}:events, and SSE subscribers each hold one connection for their lifetime — they don’t compete with the request-path pool, so a customer with 100 dashboard tabs open can’t exhaust the pool. - Prometheus metrics for pool state. New
/metricslines foractive,idle,dials_total,overflow_dials_total,poisoned_connections_total,reconnects_total,forced_closes_total,acquire_timeouts_total— operator visibility into connection churn, dial pressure under bursts, and transport-layer flakiness.
zombiectl runs end-to-end against the live API on every deploy
Every backend deploy now exercises zombiectl against api-dev.usezombie.com, and every release re-runs the same suite against api.usezombie.com (plus a daily run at 13:00 UTC). Parse, auth-guard, install, lifecycle, read sweep, and zombiectl login are all hit on real infrastructure — CLI regressions in the network path fail in CI before they reach you.CLI
- uuidv7 validation on every positional id.
zombiectl kill|stop|resume|delete|logs <id>,workspace use|delete <id>,agent delete <id>, andgrant delete <id>reject malformed ids before the network call withinvalid <name>: expected uuidv7 format (e.g. 0192a3b4-c5d6-7e8f-9012-345678901234). zombiectl agent add|list|deletedefault to the current workspace when--workspace <id>is omitted — matches the agent commands.SIGINT(Ctrl-C) duringzombiectl loginexits 130 cleanly without leaving a partialcredentials.jsonbehind. Re-runzombiectl loginfrom a fresh slate.-vshort alias for--version.zombiectl helpis a first-class command — used to fall through to the “did you mean?” suggester. Behaves identically to barezombiectl/-h/--help.credentials.jsonlands at mode0600after a successful login. Verify withls -l ~/.config/zombiectl/credentials.json.
Dashboard error voice, sign-in card lifted, install/save races fixed
Every “Failed to X” fallback in the dashboard is replaced with operator-first language keyed on backend error codes. The sign-in card no longer disappears into the page background. Two install/save races that left you on the wrong URL after arouter.push are fixed.What’s new
presentError({errorCode, message, action})is the single entry point for dashboard error rendering. CuratedUZ-XXX-NNNcodes (registry:api-reference/error-codes) map to a title + body the operator can act on —UZ-AUTH-003reads “Your session expired. Sign in again to keep going.” instead of “Not authenticated”;UZ-ZMB-009reads “We couldn’t find that agent. It may have already been deleted — refresh the list.” instead of “Internal Server Error”. Eight codes ship today and the helper grows organically as the dashboard surfaces new ones. Useless server"Failed to …"messages are detected and replaced rather than concatenated.- Sign-in card lifted from
--surface-1to--surface-2on the auth route, with--border-strongon the edge. At--surface-1the luminance delta against the page background was 3 units — close to invisible. The card now reads as a card.
Bug fixes
/zombies/newinstall —InstallZombieForm.tsxno longer issuesrouter.refresh()afterrouter.push(/zombies/{id}). Theforce-dynamicdetail route re-resolves on commit; the manual refresh was racing the URL commit and intermittently leaving you on/zombies/newwith a stale form state. Same fix applied toZombieConfig.tsxfor the save-then-navigate path on the detail page.tooltiptest flake — restoring vitest’s defaultexcludepatterns stops the runner from following the@usezombie/design-systemworkspace symlink and executing its tests without theirtest-setup.ts. The “Invalid Chai property: toBeInTheDocument” intermittent onbun run testis gone.
CLI
Nozombiectl shape changes.Authenticated dashboard e2e ungated, runs on every dev + prod deploy
The Playwright authenticated suite covers the eight dashboard lifecycles M64_005 deferred behindtest.fixme — every signed-in flow that touched a client-side useClientToken().getToken() call. Closing the gap means every KillSwitch, ZombieConfig, and provider-settings mutation now goes through a Next.js Server Action that mints the api-template JWT server-side. CI runs the suite against api-dev after every push to main and against api.usezombie.com after every production app deploy.useClientTokenretired — every dashboard mutation that previously calleduseClientToken().getToken()(six routes, eleven call sites) now invokes a per-routeapp/(dashboard)/<route>/actions.tsServer Action. Shared wrapper atlib/actions/with-token.tsexposeswithToken<T>returning the discriminatedActionResult<T>({ ok: true, data }or{ ok: false, error, status? }). Token A → Token B handoff stays server-side; the browser never sees the api-template JWT.- Three
test.fixmeblocks gone —lifecycle.spec.ts,kill.spec.ts,signup.spec.tsexercise the stop / kill / signup flows end-to-end. Lifecycle and kill assert on the dashboard listing’sdata-stateafter the Server Action completes (no morewaitForResponse(... PATCH)since the PATCH is server-internal). Signup drives Clerk DEV’s verification screen with the documented testing One-Time Password (OTP)424242. - Five new test files land —
multi-zombiepins the 5-simultaneous pulse cap (six active agents → exactly fivedata-live="true"+ one static glow + header"6 live · capped at 5").multi-workspaceswitches workspaces via the header dropdown and asserts the cookie-write Server Action keeps the URL on/zombies.settings-billingasserts thetabular-numsbalance headline + the disabledPurchase Creditstrigger.eventsandlogs-detailassert SSR + WakePulse render on/eventsand/zombies/[id]respectively (the EventDetail dialog is still unbuilt — tracked in test Discovery). <RadioGroup>+<RadioGroupItem>ship in@usezombie/design-system— Radix-backed wrapper with the shared focus-ring + token map.ModeRadio.tsxandProviderSelector.tsxconsume it; the last raw<input type="radio">inui/packages/app/**is gone.zombiectlcoverage threshold + new validation tests —bunfig.tomlenforcescoverageThreshold = { line = 0.95, function = 0.95 }. Newtest/workspace-helpers.unit.test.jscovers the previously-uncoveredVALIDATION_ERRORbranches incommandWorkspaceforuseanddelete(the existing tests handed in alphanumeric input that passesisValidIdand routed throughUNKNOWN_WORKSPACEinstead). Suite-wide coverage lifts to 95.76% function / 95.64% line.auth-e2e-devjob in.github/workflows/deploy-dev.yml— runs afterverify-devagainsthttps://usezombie-app.vercel.app, injectsCLERK_SECRET_KEY+CLERK_WEBHOOK_SECRETfrom the project’s dev secret vault, gates thenotifystep, uploadsplaywright-auth-report/asauth-e2e-dev-<sha>artifact.auth-e2e-prodjob in.github/workflows/smoke-post-deploy.yml— fires on Verceldeployment_status: successfor theusezombie-appPROD environment. Same Playwright suite, Clerk credentials supplied from the project’s prod secret vault,https://api.usezombie.comfor the API base.
Nanos billing unit, posture-dispatched stage rates, BYOK term retired
Pricing splits into a two-rate gradient: per stage on platform default, per stage when you bring your own provider key — self-managed is 10× cheaper to scale. Event receipts are free in both postures. The starter grant stays at , now denominated in nanos (1 USD = 1_000_000_000 nanos) so sub-cent rates have nine decimal places of precision. “BYOK” is retired everywhere in user-facing surfaces; the canonical term is self-managed provider key.Upgrading
- Billing column rename —
tenant_billing.balance_cents→tenant_billing.balance_nanos(BIGINT NOT NULL CHECK (balance_nanos >= 0)). Same row, same UPDATE shape, new unit. SDK + dashboard already read the new column. PUT /v1/tenants/me/provider—mode: "byok"now returns400. Passmode: "self_managed"instead. There is no compat shim, no 410, no legacy alias — clean break, pre-v2.0.zombiectl tenant provider set— flag value--mode byokremoved. Use--mode self_managed.- Schema reseed required for local dev —
make down && make up. There is no migration; the column rename is forward-only. - Internal constants renamed —
STARTER_CREDIT_CENTS→STARTER_CREDIT_NANOS,STAGE_CENTSsplit intoSTAGE_PLATFORM_NANOS+STAGE_SELF_MANAGED_NANOS,EVENT_PLATFORM_CENTScollapsed intoEVENT_NANOS = 0. The names are identical across Zig + website TS + app TS + zombiectl JS (cross-tier parity rule).
What’s new
- Single canonical contact email —
[email protected]resolves through aSUPPORT_EMAILconstant per repo (Zig, website TS, app TS, CLI JS) + this Mintlify snippet at/snippets/contact.mdx.[email protected]/[email protected]literals are gone from active source and copy. - Per-model token rates now in nanos —
model-caps.jsonshipsinput_nanos_per_mtok/output_nanos_per_mtok(was_cents_per_mtok). Same response shape otherwise. docs/architecture/billing_and_provider_keys.mdrewrote shape-first — the architecture doc names the rate constants by identifier and points at three authoritative sources (tenant_billing.zig,snippets/rates.mdx, themodel-caps.jsonendpoint) instead of pinning specific dollar amounts. Future rate ratchets no longer require a doc rewrite.
API reference
GET /v1/tenants/me/billing:GET /v1/tenants/me/billing/charges — charge_type ∈ receive () and stage ( platform / self-managed). Row amounts now in credit_deducted_nanos (was credit_deducted_cents).PUT /v1/tenants/me/provider:mode: "byok" returns 400 mode_not_recognized with the generic “mode must be one of: platform, self_managed” message — no special-case retired-mode branch.Tests
Website vitest 129/129 · app vitest 357/357 · zombiectl bun test 567/567 · Zig 29/29 · integration 1508/0. New paired pin tests on each tier locking the rate constants and theSUPPORT_EMAIL literal.Single-rate pricing — tier ladder retired
One number per surface: per event receipt, per stage, drawn from a starter credit on signup. You pick the model and pay your provider directly — zero markup on tokens. Hobby/Scale tiers are gone from the site, API, and dashboard.Upgrading
GET /v1/tenants/me/billing—plan_tierandplan_skuremoved. Other fields unchanged. Upgrade server + client together.UZ-WORKSPACE-003— message nowCredit pool exhausted(was tier-flavoured). Code unchanged; only string-matchers need to update./pricingroute 404s — content lives at/#pricing. Topbar and footer already updated.- PostHog
pricing_hobby_start_free/pricing_scale_upgradeevents removed. Pricing-page intent nowsignup_completedwithsource = pricing_install. Rebuild affected funnels.
What’s new
- One billing flow on
usezombie.com/#pricing— horizontal diagram: event cell () → N stage cells () → separate LLM stratum proving your model bill is not ours. - Operational extras turn on per workspace, never as a paywall — multi-workspace, approval gating, workspace credentials, higher concurrency, longer windows, priority support.
- Marketing site is now a single page — pricing, features, FAQ all on
/. Only/agentsand external/docsroute away. - Marketing headings use design-system primitives —
<DisplayXL>,<DisplayLG>,<SectionLabel>back every hero, section, and eyebrow.
API reference
GET /v1/tenants/me/billing:GET /v1/tenants/me/billing/charges — unchanged. charge_type ∈ receive () and stage ().UZ-WORKSPACE-003:Operational Restraint — one design system across every surface
Dashboard, marketing site, docs, andzombiectl now share one visual language: dark-first chrome, Commit Mono headings on Instrument Sans body, and a single cyan-mint wake-pulse used as currency. If something pulses, it’s alive.What’s new
- One token set across
app.usezombie.com,usezombie.com,docs.usezombie.com, and the CLI — same surface, border, text, accent values. Visual handoff between surfaces is seamless. - Pulse cyan is currency, never decoration — only live signals: status dots, primary CTAs, focus rings, the brand mark. No decorative gradients or glows.
- Light mode is first-class — full WCAG AA contrast (body 7:1, inline code 4.5:1, visible focus rings either way).
prefers-reduced-motionhonoured — wake-pulse swaps to a static halo. Metaphor survives.- Docs site landed last — Commit Mono headings, Instrument Sans body, 68-char measure. Heritage orange is gone.
What’s next
Accessibility scores, performance budgets, and dashboard live-state instrumentation will land in a future entry.zombiectl reads as part of the brand
CLI now renders against the same design system as the web surfaces. Same cyan-mint pulse, same status glyphs (● live, ○ parked, ◉ warn, ✕ failed) across every command.What’s new
- One palette end-to-end — pulse cyan, evidence amber, success green, warn amber, error red, muted/subtle greys. 256-color, mirrors the web tokens.
zombiectl --version— single line: pulse dot, binary name, version. No more agent emoji or box border.- Pulse cyan is currency — only on the live-status glyph, the
--versionmark, and help section headings. Dividers and table headers use bold default text. - Quiet machines stay quiet —
NO_COLOR=1, piped output, and--jsonemit zero ANSI escapes. - Old terminals fall back — <256-color terminals get a 16-color palette automatically (one stderr notice, then silent).
TERM=dumband non-TTY pipes go plain ASCII. zombiectl --helpfits 80 cols.
CLI
No new commands or flags.zombiectl install swaps 🎉 <name> is live. for ✓ <name> is live. via the shared glyph helpers (plain ✓ under NO_COLOR).Telemetry off by default; consistent error shape
Fresh installs send zero analytics events until you opt in. Every command also renders errors through one shared boundary now — same format and code/message pairing acrosslogin, doctor, steer, and the rest.Upgrading
ZOMBIE_POSTHOG_ENABLEDremoved — replaced byDISABLE_TELEMETRYwith inverted default:- Unset /
DISABLE_TELEMETRY=1|true|on|yes→ telemetry off. DISABLE_TELEMETRY=0|false|off|no→ opt-in.- Migration:
ZOMBIE_POSTHOG_ENABLED=falseusers can drop the var. To keep sending events, setDISABLE_TELEMETRY=0.
- Unset /
- CLI-only release — server untouched; upgrade either side independently.
What’s new
- Stable error code + friendly message —
error: UZ-AUTH-003 Token expired — run `zombiectl login` to refresh.JSON mode mirrors:{"error": {"code": "UZ-AUTH-003", "message": "…", "status": 401, "request_id": "…"}}. - One renderer for every command — login, doctor, workspace, agent, grant, tenant, billing, every agent subcommand share one error/exit-code path. No per-command drift.
CLI
- Per-HTTP-request observability — when opted in, each request emits a span (status, duration, attempts, retry reason). When off (the default), silent.
zombiectl login auto-selects the signup workspace
Fresh installs no longer need a follow-up zombiectl workspace add. After credentials persist, login fetches /v1/tenants/me/workspaces and writes the signup-provisioned default into local state. npm install -g @usezombie/zombiectl && zombiectl login is enough to reach zombiectl doctor green.Failure-tolerant: unreachable endpoint exits 0 with credentials saved; empty items[] is a no-op (never overwrites local state).Schema teardown
Pre-v2.0 cleanup of legacy structure with zero production reads:workspace_integrationstable (schema/012) — never shipped.workspace_entitlementstable (schema/004) — plan-tier scoring config superseded by credit-pool billing.core.workspacescolumns —repo_url,default_branch,paused*,version,monthly_token_budget,updated_at. From the 1:1 workspace-to-repo era; production INSERTs had already stopped writing them.integration_grants.scopes— defaulted toARRAY['*'], never read.
billing_and_provider_keys.md (then named billing_and_byok.md) corrected to / 500¢ starter (was $10 / 1000¢ — doc-vs-implementation drift).Tests
bun test 361/361. New cases in login.unit.test.js (happy path, empty items, preserved state, hydration failure) plus three integration cases for the fresh-state workflow.zombiectl first-install hardening
Three CLI bugs fixed at the customer’s first command.Bug fixes
- Default API URL is now
https://api.usezombie.com(washttp://localhost:3000). Fresh installs hit production;--api/ZOMBIE_API_URLstill override. - Sticky
--apiper-install now works.zombiectl login --api https://api-dev.usezombie.comwas writing tocredentials.jsonbut subsequent calls silently fell back to default. Root cause:parseGlobalArgswas eager-resolvingDEFAULT_API_URLand short-circuiting the precedence chain. Override order, highest first:--api→ZOMBIE_API_URL→API_URL→credentials.json→ default. Pinned by a 16-case integration matrix. zombiectl agent add(non-JSON mode) no longer crashes. Was callingui.boldwhich the theme doesn’t export; replaced withui.warn.
Tests
300 → 354 tests. New scaffolding spins up aBun.serve loopback per integration test so the full request lifecycle is exercised end-to-end. Cross-cutting failure-mode tests use the actual Zig error codes (UZ-AUTH-003, UZ-AUTH-004, UZ-WORKSPACE-002, UZ-ZMB-006, UZ-EXEC-013, UZ-INTERNAL-001).Deferred
Opt-in telemetry consent prompt at first login — deferred. The bundled PostHog write key is a placeholder (no real ingest), so a first-run prompt is friction without benefit. Implementation lives at commitfe748ee9 for cherry-pick when the key becomes valid.Secrets no longer leak into the activity stream
Streaming agent replies and final agent replies were emitting raw secret bytes alongside scrubbed tool-call arguments. Both now apply the same placeholder substitution (${secrets.llm.api_key}, ${secrets.github.token}) before reaching the activity stream or pub/sub.Tests
Regression harness asserts the bytes the executor emits against a deterministic LLM stub. Covers four invariants: tool-arg redaction, streaming-reply redaction, final-reply redaction, and pub/sub no-leak. Multi-secret coverage (LLM key + GitHub installation token in one execution).Bug fixes
- Closed an executor memory leak where the final-reply buffer was duplicated and never freed.
Dormant API + stale CLI teardown; agent lifecycle FSM unified
Breaking — API removals
PATCH /v1/workspaces/{workspace_id}(workspace pause/unpause) — never called by first-party UI/CLI.GET /v1/tenants/me/diagnostics— server-side tenant doctor block;zombiectl doctorruns local probes instead.GET /v1/workspaces/{workspace_id}/zombies/{zombie_id}/telemetry— per-agent wrapper. Underlying store and tenant-scoped reader (/v1/tenants/me/billing/charges) unchanged.
Breaking — Agent lifecycle FSM
Every state transition now flows throughPATCH /v1/workspaces/{ws}/zombies/{id} with {status: "active"|"stopped"|"killed"}. The FSM is encoded as SQL gates inside the UPDATE so parallel writes can’t bypass it:active → stopped | killedpaused → stopped | active | killed(resume from auto-pause)stopped → active | killedkilled → terminal(404 on further PATCH)pausedis platform-only (anomaly gate); operators can’t set it.
DELETE /v1/workspaces/{ws}/zombies/{id}added — hard purge. Preconditionstatus=killed; returns 204 / 409. Cascades across events, telemetry, sessions, approval gates, memory. Historical billing debits not reversed.DELETE /v1/.../current-runremoved. Replaced byPATCH {status: "stopped"}, which emits azombie_status_changedcontrol-stream signal.- Operator role required for every status transition (was only on the retired
current-run). Pureconfig_jsonpatches still permit workspace-member.
CLI
Removed (called endpoints that weren’t in the route manifest):zombiectl admin config add scoring_context_max_tokens, zombiectl workspace upgrade-scale, zombiectl workspace billing, the OPERATOR COMMANDS help section, and the ZOMBIE_OPERATOR=1 help-toggle.New lifecycle subcommands:--force flag.Dashboard
KillSwitch panel is now state-aware: Stop / Resume / Kill render based on current status, panel disables once killed.Internal
Admin platform-keys + tenant API-keys handlers gained playbook header references (load-bearing for admin bootstrap, not dormant).delete_zombie policy class corrected to critical in agent-manifest.json + skill.md.M51 follow-up — route teardown, starter-credit cut, marketing honesty
Breaking — API removals
POST /v1/execute— orphan handler from the M10 pipeline-v1 removal. Gone from binary, OpenAPI,public/llms.txt,public/skill.md,public/agent-manifest.json. External integrators (LangGraph, CrewAI, Composio) hardcoded on theexecute_tooloperationId now get HTTP 404. Replacement: per-agent webhook + agent-key flow. Pre-v2.0 carve-out — no graceful 410.GET /internal/v1/telemetry— operator endpoint, never wired to an admin tool. Data collection continues; customer-facing telemetry unchanged.- Three execute-path-only error codes removed (
UZ-CRED-004,UZ-PROXY-001,UZ-GATE-005).
Pricing
- Starter credit halved to (was $10).
STARTER_GRANT_CENTSinsrc/state/tenant_billing.zigis the only constant that changed. - Marketing copy names the two debit points wherever pricing is described — hosted execution drains on event receipt and per-stage execution.
Marketing
- Hero rewritten — “Operational knowledge isn’t executable. When a deploy fails, teams guess.” Tighter lead-ins on markdown-defined agents and the GitHub Actions → Slack flow.
- FAQ context-window answer rewritten to match runtime reality: three signals (tool-result window, memory checkpoints, stage-chunk threshold), agent enforces via
memory_store(category='conversation', ...), worker re-enqueues on a 10-stage continuation chain. Seeconcepts/context-lifecycle. - Vendor name-drops (Fly, Upstash) genericized to “your infrastructure and run logs”.
Dashboard
- Unauthenticated hits now
redirect("/sign-in")(wasnotFound()404).notFound()reserved for legitimate missing resources. - Install form shows
zombiectl install --from(was the removedzombiectl up).
Design system
Marketing pricing cards + feature-flow rows now use theCard primitive via asChild. Visually identical, semantically aligned with the dashboard.One-command platform-ops install
/usezombie-install-platform-ops is a slash-command skill that installs the platform-ops agent on any repo. Runs in Claude Code, Amp, Codex CLI, and OpenCode — same skill, same screenshot.Two-command bootstrap:What’s new
- Slash-command skill — twelve steps:
zombiectl doctorpreflight, repo detection, three operator inputs (Slack channel, prod branch glob, optional cron), credential resolution, GitHub webhook secret generation, install, in-flow HMAC-SHA256 self-test, smoke-test steer. - Bundled agent templates —
npm install -g @usezombie/zombiectlcopies canonical templates to~/.config/usezombie/samples/. Package version = template version (no URL fetch, no cache). - TRIGGER.md frontmatter overrides take effect —
x-usezombie.modeland the fourx-usezombie.contextknobs (context_cap_tokens,tool_window,memory_checkpoint_every,stage_chunk_threshold) are now honoured by the worker. Previous releases parsed and dropped them. tool_window: auto— string sentinel accepted alongside integer0.
Bring your own key (BYOK) + credit-pool billing
Tenants can run events against their own LLM provider (“BYOK”) or the platform-managed default. Both modes share one gate, one metering path, and one credit pool — they differ in drain rate, not eligibility. Every new tenant gets a $10 starter grant; gate trips on the next event after exhaustion (no in-flight kill).What changed
- Provider posture is tenant-scoped. New
core.tenant_providersrow pinsplatformorbyokper tenant. LegacyPUT|GET|DELETE /v1/workspaces/{ws}/credentials/llmremoved (404, no 410, no compat shim — pre-v2.0 carve-out). - Two-debit metering. Each event yields up to two
core.zombie_execution_telemetryrows:receive(committed at gate-pass) andstage(committed pre-execution, updated post-run with token counts). - Per-token rates. Public
_um/<key>/model-caps.jsonnow carriesinput_cents_per_mtokandoutput_cents_per_mtokper model. API server caches fromcore.model_capsat boot. - Starter grant on signup.
tenant_billing.insert_starter_grantruns in the tenant-create transaction; once per tenant, never re-applied.
API
Tenant provider:GET /v1/tenants/me/provider— resolved config;api_keynever returned.PUT /v1/tenants/me/provider— flip to BYOK with{ "mode": "byok", "credential_ref": "<vault-name>", "model"?: "<override>" }. Tenant-admin only (403 otherwise).DELETE /v1/tenants/me/provider— equivalent toPUT mode=platform. Surfaces a low-balance warning if applicable.
GET /v1/tenants/me/billing— balance snapshot (unchanged).GET /v1/tenants/me/billing/charges?limit=— newest-first credit-pool rows, one per(event_id, charge_type). Backs the Billing Usage tab.
PUT|GET|DELETE /v1/workspaces/{ws}/credentials/llm (never wired to a runtime resolver). Use /v1/tenants/me/provider plus a workspace-vault credential.CLI
zombiectl tenant provider {get|set|reset}— manage the tenant’s LLM posture.set --credential <name> [--model <override>].resetwarns if balance < 100¢.zombiectl billing show [--limit N] [--json]— read-only balance + last N events (receive / stage / total cents). Nopurchase/topupsubcommands; Stripe lands in v2.1.
Dashboard
- Settings → LLM Provider (
/settings/provider) — mode toggle + BYOK form. Credential dropdown sources from the active workspace vault. - Settings → Billing (
/settings/billing) — read-only summary. Headline balance + disabled “Purchase Credits” (tooltip: “Coming in v2.1”). Usage tab grouped by event; Invoices and Payment Method tabs are v2.1 placeholders.
Upgrading
- CLI: drop direct calls to
/workspaces/{ws}/credentials/llm. Store the credential in the workspace vault, thenzombiectl tenant provider set --credential <name>. - Dashboard: existing tenants stay on platform-managed by default. Switch via Settings → LLM Provider.
- Custom integrations:
model-caps.jsonis additive — old fields preserved, plus the new per-model rate fields.
Notes
- Pricing visibility — per-model rates are in the public-but-unguessable
model-caps.json. Trade-off accepted: cacheable, unauthenticated, low-latencytenant provider setresolution. - No plan tiers — “Free” is just “starter grant not yet exhausted.” Platform and BYOK share
processEventandcompute_*_charge; they differ in drain rate, not eligibility.
URL hygiene — verb routes become resource collections
Two URL families lose their verb-shaped URLs. Pre-v2.0 carve-out: retired URLs return404, no 410 shim.Upgrading
CLI and server upgrade together.-
Steering an agent:
POST /v1/.../zombies/{zid}/steer→POST /v1/.../zombies/{zid}/messages. Body unchanged. CLI subcommand stayszombiectl zombie steer(verb on CLI, noun on wire). -
Memory: four verb endpoints collapse into one resource:
POST /v1/memory/store→POST /v1/.../zombies/{zid}/memoriesGET /v1/memory/recall?...→GET /v1/.../zombies/{zid}/memories?query=...GET /v1/memory/list?...→GET /v1/.../zombies/{zid}/memories(no?query=)POST /v1/memory/forget→DELETE /v1/.../zombies/{zid}/memories/{memory_key}
DELETEis idempotent — missing key returns204(was{"deleted": true|false}).zombie_idmoves from query string to path segment.memory_store/memory_recallagent-tool names unchanged.
What’s new
- Stricter routing — dispatcher parses paths into segments once at the boundary;
//and trailing slashes no longer match wrong handlers. Malformed paths return deterministic404. - Single source of truth for
v1— version literal in one place.
REST cleanup — /complete + /kill move to PATCH; config hot-reload lands
Two verb-suffix endpoints retire to PATCH on the resource. Agent config edits now hot-reload mid-loop: edit in the dashboard or via PATCH config_json and the worker swaps tools, network policy, and context budget on the next event boundary. Old config freed in the same step.Upgrading
CLI commands (zombiectl kill, zombiectl login) are unchanged — they wrap these URLs internally. Direct API consumers:POST /v1/.../zombies/{id}/kill→PATCH /v1/.../zombies/{id}with{ "status": "killed" }.POST /v1/auth/sessions/{id}/complete→PATCH /v1/auth/sessions/{id}with{ "status": "complete", "token": "<user-jwt>" }. Response now matches the GET poll shape.POST /v1/.../zombies/{id}/steerunchanged this release; the rename toPOST /eventslands in a future URL pass.
What’s new
- Config hot-reload — tools list, network allowlist, secrets map, and
tool_window/memory_checkpoint_every/stage_chunk_thresholdall swap mid-loop. No worker restart, no memory leak on swap. - One PATCH for combined updates —
{ config_json, status }in one request is atomic; one SQL UPDATE + one control-stream signal per dirty surface. - Cleaner OpenAPI — three verb-suffix paths gone; Slack and GitHub OAuth callbacks moved to a vendor-immortal classification (pinned, but distinguished from internal cleanup debt).
API
PATCH /v1/.../zombies/{id}— partial body{ config_json?, status? }. Both optional; empty body = 200 no-op. Whenstatusis set it must equal"killed". Response includesconfig_revision.PATCH /v1/auth/sessions/{id}— body{ status: "complete", token }. Bearer auth (depositor proves it can mint a user-jwt). Response:{ status, token, request_id }.- Validation message for invalid
status:status must be "killed"(UZ-VAL-001).
POST /v1/.../zombies/{id}/kill, POST /v1/auth/sessions/{id}/complete.Frontmatter cleanup — runtime config moves under x-usezombie:
TRIGGER.md no longer carries runtime keys at the top level. tools, credentials, network, budget, trigger all live under one x-usezombie: block. SKILL.md now requires name, description, version; install rejects bundles where SKILL.md and TRIGGER.md name: disagree.Upgrading
Every agent bundle. Migration is mechanical:- TRIGGER.md — add
x-usezombie:at top level and indent the existing blocks under it. Keep top-levelname:. - SKILL.md — frontmatter needs
name:,description:,version:. Matchname:to TRIGGER.md. zombiectl install --from <dir>— re-run until field-level errors clear.
What’s new
- Disciplined parser — unknown subkeys under
x-usezombie:fail loud (UnknownRuntimeKey). Top-level stays permissive —x-amp:and other vendor blocks pass through. - Cross-file identity —
name:must match across both files; enforced at install. - Real YAML — bespoke converter replaced with
kubkon/zig-yaml0.2.0. Multi-line strings, escapes, standard scalar tags, arbitrary nesting all work.
API
Two new error codes fromPOST /v1/workspaces/{ws}/zombies:UZ-ZMB-008(MSG_ZOMBIE_INVALID_CONFIG) — now also fires for malformed SKILL.md frontmatter.UZ-ZMB-011(MSG_ZOMBIE_NAME_MISMATCH) — when SKILL.md and TRIGGER.mdname:disagree.
config_json->'trigger'->... → config_json->'x-usezombie'->'trigger'->... (operators reading raw rows).Approval inbox — pending gates surface in the dashboard
Approvals used to flow only through Slack DMs. Now every pending gate surfaces in a workspace-wide/approvals list and on each agent’s detail page, with proposed action, blast-radius, evidence, and a timeout countdown rendered next to Approve and Deny buttons. Slack callbacks and dashboard clicks share one resolve core — whichever lands first wins, the other channel’s stale button no-ops with the original outcome and resolver attribution.What’s new
/approvalspage — workspace-wide list, oldest-first. Row shows agent, gate kind, proposed-action one-liner, blast radius, age, timeout countdown, inline Approve/Deny. Refreshes every 5s; empty state renders clean./approvals/{gate_id}detail page — full proposed-action prose, evidence as expandable JSON, context grid, Resolve panel with optional reason. Once resolved, flips toResolved as <outcome> by <who> at <when>.- Per-agent Pending approvals panel — on each agent’s detail page, plus a destructive-variant badge in the header (
N pending approval(s)or50+). - Sidebar nav — new “Approvals” entry between Credentials and Events.
- Auto-timeout sweeper — background thread scans
core.zombie_approval_gatesevery 60s; transitions pending rows pasttimeout_attotimed_out(worker treats asdeniedfor destructive ops). Default 24h.
API
GET /v1/workspaces/{ws}/approvals?status=&zombie_id=&gate_kind=&cursor=&limit=— paginated. Defaultstatus=pending,limit=50, max 200. Cursor encodes(requested_at, gate_id).GET /v1/workspaces/{ws}/approvals/{gate_id}— single read; 404 on missing or cross-workspace.POST /v1/workspaces/{ws}/approvals/{gate_id}:approve— body{reason?}(≤4096). 200 / 409 (UZ-APPROVAL-006— original outcome + resolver returned).POST /v1/workspaces/{ws}/approvals/{gate_id}:deny— same shape.ApprovalGateshape gainsgate_kind,proposed_action,evidence(JSONB),blast_radius,timeout_at,resolved_by.
Bug fixes
- Slack/dashboard race fixed — both paths now go through
UPDATE … WHERE status='pending'. Loser sees 409 with the original outcome, never a silent overwrite.
Streaming substrate hot-path cleanup
Worker → live-tail performance pass:- Activity-frame JSON encoding reuses a per-event scratch buffer — per-frame heap alloc gone (~43µs → ~2µs on chunk-heavy responses).
- Executor transport parses each progress frame once (was twice, ~46% faster).
- Workers open a dedicated Redis client for activity PUBLISH — no contention with stream commands on the queue client’s mutex.
- Per-agent events index leads with
(zombie_id, created_at DESC, event_id DESC)— covers the dashboard view + keyset pagination directly.
Streaming substrate — every event has provenance; live activity tails the dashboard
Every event (steer, webhook, cron, chunked continuation, gate-resolved continuation) lands on one Redis stream with a normalized envelope and anactor field carrying provenance forward. Every event start/end is durably persisted in core.zombie_events with payload, response, tokens, wall time, failure label. Dashboard ships a live SSE activity panel with sub-200ms publish-to-receive.Upgrading
POST /steershape changed — now does a directXADDand returns{event_id}. Legacyzombie:{id}:steerRedis key gone. Scripts reading the steer key directly: switch to the SSE stream or events history.GET /v1/.../zombies/{id}/activityremoved (per-agent + workspace-aggregate). Replace withGET /v1/.../zombies/{id}/eventsorGET /v1/.../events?zombie_id=. Response carriesactor,status,response_text,tokens,wall_ms.zombiectl logsmigrated automatically.core.activity_eventstable dropped. Pre-v2.0 teardown — no migration. Switch tocore.zombie_events; primary key is(zombie_id, event_id)for idempotent replay under XAUTOCLAIM.- Executor RPC bumped to v2. Worker + executor must upgrade together (HELLO handshake on connect; aborts on
executor.rpc_version_mismatch). Roll executor first.
What’s new
- One ingress, one durable record per event — each event produces one
core.zombie_eventsrow, onezombie_execution_telemetryrow, onecore.zombie_sessionsmutation. Sameevent_idjoins narrative, billing, session state. Replays idempotent via composite-key ON CONFLICT. - Continuation actors stay flat — chunked or gate-resolved continuations re-enter as
actor=continuation:<original_actor>, never nested.actor LIKE '%steer:kishore'finds origin + every continuation;resumes_event_idwalks back via recursive CTE. gate_blockedevents visible but unresolvable until the Approval Inbox ships. Row enters terminal state withfailure_labelpopulated + XACK. Admin-resume fallback dropped.- Dashboard live panel —
<LiveEventsPanel />above the history table. NativeEventSource→ same-origin Next Route Handler that mints an API-audience JWT server-side. Browser never holds the JWT; backend never sees a cookie. Exponential backoff capped at 15s, rolling 20-frame buffer.
API
POST /v1/workspaces/{ws}/zombies/{id}/steer— body{message}(≤8192). 202 with{status: "accepted", event_id}.GET /v1/workspaces/{ws}/zombies/{id}/events?cursor=&actor=&since=&limit=— paginated.actoraccepts globs (steer:*,webhook:*).sinceaccepts Go durations (15s,2h) or RFC 3339. Default 50, max 200.sinceandcursormutually exclusive.GET /v1/workspaces/{ws}/events?cursor=&actor=&zombie_id=&since=&limit=— workspace-aggregate; items carryzombie_id.GET /v1/workspaces/{ws}/zombies/{id}/events/stream— SSE. Frame kinds:event_received,tool_call_started,tool_call_progress(~2s heartbeat),chunk,tool_call_completed,event_complete. Per-connection seq ids reset on SUBSCRIBE;Last-Event-IDignored. Disconnect → backfill viaGET /events?since=<last_seen>then reopen.
CLI
zombiectl steer {id} "<message>"— batch mode. POSTs, opens SSE, prints[claw] <chunk>as chunks arrive, exits 0 onevent_complete. PollsGET /events?since=for 60s if SSE drops. Interactive REPL deferred.zombiectl events {id}— paginated history.--actor=,--since=,--json,--cursor=. Default 50/page.zombiectl logs {id}— repointed at events endpoint; row format nowactor+response_textsummary.
Install actually works — contract aligned, parser fixed, doctor tightened
Three bugs that madezombiectl install --from <path> unusable on a fresh workspace, all fixed in one pass.Upgrading
- Install POST shape changed —
POST /v1/workspaces/{ws}/zombiesnow accepts{trigger_markdown, source_markdown}. Server is the single parser of TRIGGER.md frontmatter;name+config_jsonderived server-side. Pre-v1.0, no compat shim. - TRIGGER.md key
skills:→tools:— sample already usedtools:; parser now matches. Older specs need the rename;ERR_ZOMBIE_INVALID_CONFIGwith hint when missing. zombiectl install+doctorrequirezombiectl login— were previously exempt and produced opaque 401s. Now fail locally withAUTH_REQUIREDbefore any HTTP call.
What’s new
- Doctor checks the three things that matter —
server_reachable(GET /healthz, 5s timeout),workspace_selected,workspace_binding_valid. Oldhealthz/readyz/credentialschecks folded in or dropped. - Doctor
--jsonschema —{ok, api_url, checks: [{name, ok, detail}]}. Each failed check carries a one-linedetailpointing at the next action. - Install response —
{zombie_id, name, status}. CLI displays the server-derived name; copy/paste matches what the server stored.
API
POST /v1/workspaces/{ws}/zombies— body{trigger_markdown, source_markdown}(≤64KB each). 201 with{zombie_id, name, status}. 400ERR_ZOMBIE_INVALID_CONFIGon frontmatter parse failure; 400ERR_INVALID_REQUEST(MSG_ZOMBIE_TRIGGER_REQUIRED) on empty/oversized trigger.
CLI
zombiectl install --from <path>— sends the new shape; success line uses the server’s name.zombiectl doctor— three checks, per-check 5s timeout, exit 0/1.
Worker substrate — install an agent, see it work in seconds
Agents installed viaPOST /v1/workspaces/{ws}/zombies are claimed by a worker thread within ~1s of the 201. No worker restart needed. A new POST .../kill aborts in-flight agents cleanly; PATCH .../zombies/{id} hot-reloads config; SIGTERM triggers graceful drain.What’s new
- Atomic install — INSERT into
core.zombies+XGROUP CREATE MKSTREAM+XADD zombie:control * type=zombie_createdhappen synchronously before the 201 returns. Webhooks arriving 1ms later find the consumer group ready. - Fleet-wide control plane — Redis stream
zombie:controlcarriescreated/status_changed/config_changed/drain_request. One watcher thread per worker dispatches to spawn / cancel / reconfigure handlers. - Per-agent cancel flag — atomic flag at top of every loop iteration.
POST /killflips it; thread exits within ~100ms. zombiectl kill <zombie_id>— now POSTs to/kill, requires explicit agent id (was a DELETE that defaulted to “kill all in workspace” — footgun gone).
API
POST /v1/workspaces/{ws}/zombies/{id}/kill— 200{zombie_id, status: "killed", queued_at}; 404 on missing/already-killed (idempotent).PATCH /v1/workspaces/{ws}/zombies/{id}— body{config_json?}. 200{zombie_id, config_revision}(revision = monotonicupdated_at).DELETE /v1/workspaces/{ws}/zombies/{id}— removed.
platform-ops — flagship agent for GitHub Actions deploy failures
New sample at samples/platform-ops/. Wakes on a GitHub Actions workflow_run.conclusion=failure webhook, gathers evidence from the failed workflow logs, your hosting provider, and your data-plane, then posts an evidenced diagnosis to Slack. Reachable manually via zombiectl steer {id}. Read-only against GitHub, Fly, Upstash; only write path is the Slack post.What’s new
- Sample bundle —
SKILL.md(diagnosis prompt + evidence flow),TRIGGER.md(webhook trigger, network allowlist for the four hosts, 8/month caps),README.md(operator walkthrough including the GitHub webhook setup). - Four credential shapes —
github,fly,upstashuse{host, api_token};slackuses{host, bot_token}. Add viazombiectl credential add <name> --host <host> --api-token <token>(or--bot-token). - Install —
zombiectl install --from samples/platform-ops. Webhook URL printed at install time; paste into your GitHub repo’s webhook settings filtered toworkflow_run. - Sandbox — bwrap + landlock + cgroups (Linux); network deny-by-default; only
network.allowhosts reachable. - Provenance — events land with
actor=webhook:githuboractor=steer:<operator>.
Dashboard — full lifecycle in the browser
app.usezombie.com reaches its first “I can run my day from here” shape: overview tiles + recent activity, agents list with cursor pagination + search, install form, per-agent detail page (webhook URL, config, one-click kill). Workspace switcher in the header. Credit-exhaustion banner driven by is_exhausted / exhausted_at on GET /v1/tenants/me/billing.Upgrading
- Kill switch path renamed —
POST /v1/.../zombies/{id}/stop→DELETE /v1/.../zombies/{id}/current-run. Same behavior, same shape, same 200/409/404 semantics. Old path returns 404 (pre-v1.0 alpha, no deprecation window). REST hygiene:current-runis a singleton sub-resource; DELETE is the idiomatic verb.
What’s new
- Overview (
/) — status tiles + tenant credit balance + live recent-activity feed. Server Components with independent Suspense boundaries. - Agents list (
/zombies) — cursor pagination, in-view search across name/id/status. - Install form (
/zombies/new) — design-systemFormprimitive (react-hook-form + zod). Toast on duplicate name. - Agent detail (
/zombies/[id]) — webhook copy, trigger + firewall panels, rename/describe/delete-with-confirm, React-19useOptimistickill switch with 409 auto-recovery. - Workspace switcher —
GET /v1/tenants/me/workspaces+ Server Action writingactive_workspace_idcookie. No session reissue. - Placeholder pages at
/firewall,/credentials,/settings. - Credit-exhaustion banner + per-agent badge — automatic from
is_exhausted. - Auth abstraction —
@clerk/nextjsflows throughlib/auth/{server,client}.ts. Switching auth provider is a two-file edit. - Same-origin
/backendproxy — browser fetches go through/backend/:path*(Next rewrites toAPI_BACKEND_URL). No CORS surprises.
API
- New
GET /v1/tenants/me/workspaces—{ items: [{id, name, created_at}], total }. - Changed
GET /v1/workspaces/{ws}/zombies?cursor={ts}:{id}&limit=N— default 20, max 100. Response adds nullablecursor. - Renamed (breaking)
DELETE /v1/workspaces/{ws}/zombies/{id}/current-run— transitions tostopped, returns{zombie_id, workspace_id, status: "stopped", request_id}. 409UZ-ZMB-010on already-stopped/killed; 404UZ-ZMB-009on cross-workspace. Operator role required.
CLI
zombiectl --helpsurfaces full lifecycle:install | up | status | kill | logs | credential.zombiectl list [--workspace-id] [--cursor] [--limit] [--json]— mirrors the dashboard’s agents list (≤100 limit clamp).zombiectl workspace show— mirrors/settings(workspace id, name, active status).- Active workspace is persistent —
zombiectl workspace use <id>writes~/.config/zombiectl/workspaces.json; subsequent commands default to it. Independent of the dashboard’s cookie. zombiectl killunchanged (full delete, not current-run kill).
Admin-by-env-var removed; credit exhaustion observable
TheAPI_KEY env-var bypass (which minted an admin with no tenant or audit identity) is gone. Admin auth now flows exclusively through Clerk sessions with publicMetadata.role=admin. Programmatic admin access: tenant-minted zmb_t_… key from POST /v1/api-keys. Tenant billing now surfaces credit exhaustion explicitly.Upgrading
- Drop
API_KEYfrom your server env — silently ignored. Server refuses to start without OIDC (OIDC_JWKS_URL,OIDC_ISSUER,OIDC_AUDIENCE). - Promote your admin in Clerk — Dashboard → Users → Metadata → Public →
{"role": "admin"}. Seeplaybooks/012_usezombie_admin_bootstrap/001_playbook.mdfor the dev + prod walkthrough that ends with azmb_t_…key inop://ZMB_CD_<env>/usezombie-admin/api_key. - If you read
balance_cents == 0— switch tois_exhausted/exhausted_at.
What’s new
BALANCE_EXHAUSTED_POLICY={continue|warn|stop}(defaultwarn).stop— pre-empts delivery, XACK so it doesn’t retry, emitsbalance_gate_blocked.warn— logs + emits rate-limitedbalance_exhausted(1/workspace/24h).continue— old behavior, made explicit.
- First-exhausting debit atomically stamps
balance_exhausted_atand emits a one-shotbalance_exhausted_first_debit. Replays don’t double-emit.
API
GET /v1/tenants/me/billing gains two fields:is_exhausted(boolean) — true once balance hits zero on a worker debit.exhausted_at(integer epoch ms or null) — non-null only whenis_exhaustedis true.
Observability — per-agent tokens wired, OTLP histograms exported
Two observability paths that looked live but weren’t.What’s new
zombie_agent_tokens_by_workspace_totalcarries bothworkspace_idandzombie_idlabels; reports real data on every completed delivery. Useful for top-N spend dashboards at either granularity.zombie_workspace_metrics_overflow_totalexposed — saturation indicator for the 4096-slot(workspace_id, zombie_id)table; overflow falls back to_otheraggregation.
Bug fixes
- Per-workspace token counter was a no-op (helper existed, never called). Now fires from the same spot as
zombie_tokens_total. - OTLP JSON exporter silently dropped
_bucket/_sum/_count— histograms (zombie_execution_seconds,zombie_agent_duration_seconds,zombie_executor_agent_duration_seconds) never reached collectors. Exporter now emits OTLP histogram data points with cumulative-to-delta conversion,explicitBounds,aggregationTemporality: 2. - Removed
zombie_gate_repair_loops_*counters — pipeline-era concept with no agent-era call site, always read zero, misled operators.
Docs follow-up — rewritten for the v2 MVP
docs.usezombie.com rewritten end-to-end against the current product. Quickstart walks a fresh operator from Clerk sign-up to a live agent firing webhook events in under ten minutes. Stale pre-Clerk vocabulary cleared from every page outside the historical changelog.What’s new
- New quickstart — sign up → dashboard → create agent → copy webhook →
curltrigger → verify credit debit. One page, end-to-end. - New CLI reference at
/cli/zombiectl. - Self-hosting section under
/operator— (removed in M51 prep when self-host was deferred to v3; seeusezombie/usezombie:docs/architecture/for the canonical reference). - Concepts page — four nouns (tenant, workspace, agent, skill) + tenant-scoped credit model.
- Billing pages — rewritten around single-wallet, multi-workspace.
Tenant-scoped billing
Billing moves from workspace to tenant. Every signup gets onebilling.tenant_billing row (plan_tier=free, plan_sku=free_default, 1000¢ balance). All workspaces under a tenant share that balance — no more per-workspace credit grants on workspace creation. Workspace-scoped billing endpoints removed.Removed
POST /v1/workspaces/{ws}/billing/eventsPOST /v1/workspaces/{ws}/billing/scaleGET /v1/workspaces/{ws}/billing/summaryGET /v1/workspaces/{ws}/zombies/{id}/billing/summaryPOST /v1/workspaces/{ws}/scoring/config
What’s new
- One tenant, one billing row —
billing.tenant_billing(plan_tier, plan_sku, balance_cents, grant_source, updated_at)withtenant_idas PK. - Atomic worker debit — conditional
UPDATE … WHERE balance_cents >= $cents RETURNING. Exhausted balance returnsUZ-BILLING-005 CreditExhausted(no partial debits). - Schema slots resequenced to contiguous
001..018(tidy pre-v2.0 baseline).
API
GET /v1/tenants/me/billing — caller’s tenant snapshot:UZ-AUTH-001 without a valid token.Clerk-powered signup
Users sign up through Clerk and get auto-provisioned. A Clerkuser.created webhook to POST /v1/webhooks/clerk atomically creates tenant + user (bound to Clerk OIDC subject) + owner membership + default workspace (Heroku-style name) + 0-cent credit state. Idempotent on replay.What’s new
- Signup webhook — Svix signature verified inline against
CLERK_WEBHOOK_SECRET; stale timestamps (>5 min drift) rejected. - Heroku-style names — 1,024,000-combo namespace (32 adjectives × 32 nouns × 1000 suffixes); per-tenant uniqueness via partial index.
- Identity model — new
core.users(indexed by Clerk OIDC subject) +core.memberships(user→tenant with role). Ready for team accounts later.
API
POST /v1/webhooks/clerk — body is a Clerk user.created envelope; headers svix-id, svix-timestamp, svix-signature required. Responses:- 200
{workspace_id, workspace_name, created} - 400
UZ-REQ-001(malformed / missing email) - 401
UZ-WH-010(bad sig) /UZ-WH-011(stale ts) - 413
UZ-REQ-002(body > 2 MB) - 500
UZ-INTERNAL-*
user.created events are 200-ignored so Clerk stops retrying.Observability
- Three Prometheus counters:
zombie_signup_bootstrapped_total,zombie_signup_replayed_total,zombie_signup_failed_total(withreasonlabel). - PostHog event
signup_bootstrapped(distinct_id = oidc_subject); email domain only, never full email. - Log scopes:
clerk.bad_sig,clerk.stale_ts,clerk.bad_request.
Unified design system across the dashboard and marketing site
Buttons, cards, dialogs, inputs, and other UI primitives now come from a single@usezombie/design-system package. The dashboard and marketing site share one source of truth — tweak a variant once, both surfaces update.The new /agents page adds an interactive hero and animated terminal. Landing JS is under 90 kB gzipped, with a size-limit CI gate guarding bundle size. PostHog loads on idle so first paint is no longer blocked.One credential surface for agents
Workspace credentials now flow through a single path:zombiectl credential add writes to the workspace vault, and that’s what every agent reads at runtime. No parallel surfaces, no guessing which command owns a given secret.Docs reshaped around the agent lifecycle
A new Agents section walks through installing a template, adding credentials, running, observing, and killing an agent. Pages describing the legacy v1 pipeline have been retired; the old/specs/* and /runs/* URLs now 404.New pages: overview, install, running, credentials, webhooks, skills, templates.Tenant API keys
Tenant admins can now mint named, rotatable API keys viaPOST /v1/api-keys — scoped to the tenant, revocable, and audited. Raw keys (zmb_t_…) are shown once on creation; only the hash is stored. The legacy API_KEY env var still works as a bootstrap fallback.Workspace-scoped external agent keys were renamed to agent keys: /v1/workspaces/{ws}/external-agents → /v1/workspaces/{ws}/agent-keys.Unified webhook authentication — seven first-class providers
Every per-agent webhook flows through one fail-closed middleware that handles URL-embedded secrets, Bearer tokens, HMAC signatures, and Svix multi-signature rotation with constant-time comparisons.Seven providers ship first-class: agentmail, Grafana, Slack, GitHub, Linear, Jira, and Clerk (via Svix). Onboarding takes one field inTRIGGER.md; secrets are workspace-vaulted and rotate without an agent redeploy. See Webhooks.Operator dashboard foundation
Workspace-wide activity feed, operator kill switch for runaway agents, and per-agent billing summary that mirrors the workspace view. Billing numbers now come from real execution telemetry (previously zeroed since v0.10).Ships with six accessible React primitives —StatusCard, EmptyState, Pagination, DataTable, ConfirmDialog, ActivityFeed — and Tailwind v4 semantic design tokens.Consistent pagination and full OpenAPI coverage
Every list endpoint returns the same{ items, total, cursor? } envelope so SDK generators can emit a single Paginated<T> type. Memory reads moved to GET, and openapi.json now documents every route the server exposes — 26 previously undocumented operations are authored in.Workspace-scoped REST paths
Identity — workspace, agent, grant — is now always in the URL path (/v1/workspaces/{ws}/zombies/{id}), and query parameters are reserved for pagination and search. Every handler authorizes workspace membership after authentication; cross-workspace lookups return 404, so the API does not leak the existence of resources you cannot see.Live agent steering
Redirect a running agent mid-execution without killing it.POST /v1/workspaces/{ws}/zombies/{id}/steer injects a message into the agent’s event stream — delivered mid-execution if the agent is running, queued otherwise (300-second TTL).Persistent agent memory
Agents remember facts across executions. Memory is row-scoped per agent and persists in Postgres — a lead-collector agent doesn’t re-research the same lead, a support agent doesn’t re-ask customers their plan. Tools:memory_store, memory_recall, memory_list, memory_forget.Integration grants + credentialed proxy
Agents — internal or external (LangGraph, CrewAI) — call external services through usezombie’s credentialed proxy. Credentials never leave the platform: injected server-side, stripped from response echoes, and logged to the activity stream.An agent requests a grant, humans approve once via Slack/Discord/dashboard, and the grant is reusable until revoked. Launch providers: Slack, Gmail/AgentMail, Discord, Grafana. New CLI:zombiectl agent create|list|delete, zombiectl grant list|revoke.Agent execution telemetry
Every event delivery recordstoken_count, time_to_first_token_ms, wall_seconds, and credit_deducted_cents, queryable per-agent via GET /v1/workspaces/{ws}/zombies/{id}/telemetry. Each delivery also emits an OpenTelemetry zombie.delivery span that lines up correctly in Grafana Tempo.Slack plugin
Connect Slack via “Add to Slack” OAuth orzombiectl credential add slack. Bot tokens live in the vault; events and interactions are HMAC-verified with constant-time comparison. Any agent with a slack_event trigger fires automatically on matching messages.Agent observability
Every trigger and delivery shows up in Grafana and PostHog. Prometheus exposeszombies_triggered_total, zombies_completed_total, zombies_failed_total, zombie_tokens_total, and a zombie_execution_seconds histogram; PostHog fires zombie_triggered and zombie_completed with tokens, wall-time, and exit status.Agent credit metering
Free-plan agents deduct fromconsumed_credit_cents after each successful delivery at 1 cent per agent-second; Scale is unlimited and short-circuits without a DB write. Crash replay is idempotent on event_id, and a DB hiccup never drops or double-charges an event.Agent directory format, AI Firewall, error standardization, pipeline v1 removal
Agent directory format
Agents are now two-file directories (SKILL.md + TRIGGER.md). SKILL.md follows the ClaHub registry format — same file uploads to the CLI and publishes to the skill registry. TRIGGER.md carries deployment config (trigger, chain, budget, network, credentials). zombiectl install scaffolds both; zombiectl up sends them raw.Dynamic skills
Skills are config-driven. The NullCraw executor readsSKILL.md and uses built-in tools (shell, http, file_read) to call external APIs. Adding a new skill = new directory; no server rebuild.AI Firewall — 4-layer outbound inspection
- Domain allowlist — only
network.allowdomains reachable. - Endpoint policy — per-endpoint rules in
firewall:(e.g., allow GET, deny POST). - Prompt-injection detection — outbound bodies scanned for instruction override / role hijacking / jailbreaks.
- Content scanning — response bodies scanned for credential and PII leakage.
API error format (RFC 7807)
All errors now useapplication/problem+json with UZ- prefixed codes. Every code has a stable HTTP status — callers no longer parse status codes independently.Pipeline v1 removed
All/v1/runs/* and /v1/specs return 410 Gone with ERR_PIPELINE_V1_REMOVED. Use agent-native SSE stream + chat-inject instead.Webhook auth — URL-embedded secret
Preferred:POST /v1/webhooks/{zombie_id}/{secret}. Bearer token still supported as fallback.Internal
All handler boilerplate (arena, request id, Bearer auth) moves to a sharedhx.zig wrapper. Handlers contain only business logic.Lead Agent — v2 core ships
usezombie is now a runtime for always-on agents. Two commands, running agent:What’s new
- Agent config format — YAML frontmatter (trigger, skills, credentials, budget) + markdown body. CLI compiles YAML → JSON before upload; server sees JSON only. Voice-transcribed instructions supported as the body.
- Webhook ingestion — every agent gets
POST /v1/webhooks/{zombie_id}. Routing by primary key (no name collisions). Bearer auth per agent. Idempotent via Redis SET NX (24h TTL). Returns 202 / 200. - Activity stream — append-only
core.activity_events(UPDATE/DELETE blocked by trigger).zombiectl logsstreams it; cursor-paginated replay. - Credential injection — vault → sandbox at runtime. No credentials in config files.
zombiectl credential addto register. - Session checkpoint — conversation context upserted to Postgres after each event. Resume from last checkpoint after crash.
- CLI —
zombiectl install | up | status | kill | logs | credential add | credential list. - Schema additions —
core.zombies(JSONB config),core.zombie_sessions(checkpoint),core.activity_events. Applied automatically byzombied migrate. - API — 16 v1 endpoints removed from OpenAPI;
POST /v1/webhooks/{zombie_id}added. - Version tooling —
make sync-version/make check-versionprevent drift acrossbuild.zig.zonandzombiectl/package.json.
Bug fixes
- YAML parser was silently dropping array items in CLI config upload.
- UTF-8 truncation was splitting multi-byte characters in session context.
Steer running agents mid-run
Interrupt a running agent without aborting it.zombiectl runs interrupt <run_id> <message> or POST /v1/runs/{id}:interrupt. Picked up at the next gate checkpoint. Two modes: queued (next checkpoint) and instant (IPC delivery).What’s new
- Live run streaming (CLI) —
zombiectl run --spec <file> --watchstreams gate results in real time.Last-Event-IDreconnect replays only missed events. Ctrl+C clean exit. - Run replay (CLI) —
zombiectl runs replay <run_id>prints a per-gate narrative for completed runs (exit codes, stdout/stderr, wall time). - Workspace billing breakdown —
zombiectl workspace billing --workspace-id <id>shows completed / non-billable / score-gated runs.--period,--json. Backed byGET /v1/workspaces/{id}/billing/summary. - Run observability — full trace tree in Grafana Tempo (
{run.id="<id>"}waterfall). Per-workspace Prometheus metrics: tokens, run outcomes, gate-repair loop distribution. - Resource efficiency scoring v2 — runs now scored on actual memory + CPU usage; agents staying within limits score higher.
Breaking
SSEid: on live events changed from sequential counter to created_at Unix milliseconds. Clients parsing Last-Event-ID as a sequence must update.Live run streaming (API)
The SSE stream endpoint is live:GET /v1/runs/{id}:stream emits gate results in real time as the agent works. CLI support (--watch) is coming in a future release.Run replay (API)
Replay any finished run step by step via the API:GET /v1/runs/{id}:replay returns a structured gate narrative with exit codes, stdout/stderr, and wall time. CLI support (zombiectl runs replay) is coming in a future release.