Homelab Zombie

Source attribution. The narrative on this page follows docs/brainstormed/docs/homelab-zombie-launch.md. The executable Homelab Zombie ships at samples/homelab/ in the usezombie repo as one SKILL.md + one TRIGGER.md — no sub-skill directories, no YAML allowlists. Authored by M33.

Homelab Zombie is the flagship v2 sample. It is an AI agent that investigates incidents in your homelab — pods crash-looping, disks filling, containers OOM-killed — using real kubectl and Docker access, but in a way designed to be safe to run unsupervised at 11pm on a bad day.

What it does

You talk to it:

$ zombie
→ Homelab zombie ready. What's up?

> Jellyfin pods keep restarting

The zombie reasons in a loop:

[00:02] kubectl get pods -A | grep jellyfin
        → 2 pods found, both in CrashLoopBackOff
[00:05] kubectl describe pod jellyfin-0 -n media
        → Last state: OOMKilled, exit code 137
[00:07] kubectl logs jellyfin-0 -n media --previous --tail=200
[00:10] kubectl top pod jellyfin-0 -n media
        → memory: 512Mi (at limit)
[00:12] kubectl get deployment jellyfin -n media -o yaml
        → resources.limits.memory: 512Mi
[00:14] kubectl describe node homelab-02
        → MemoryPressure: False, 8Gi available

And produces a report:

Root cause (high confidence): Jellyfin is being OOM-killed. Memory limit
is 512Mi; usage spikes above this during library scans and transcoding.

Proposed fix:
    kubectl patch deployment jellyfin -n media -p '{...}'

Risk: low. Node has 8Gi free.

You keep the conversation going in the same session:

> does immich have the same problem?
> now check if any node is under memory pressure

Context is maintained across turns. It’s a conversation, not a command tree. In v0.1 the zombie stops at the proposal — no writes, no restarts. It’s diagnostic, not remedial. That’s a feature: trust the thing before you hand it a hammer.

Three things that make it safe to run

1. The allowlist lives in prose, inside the SKILL.md prompt

The zombie’s SKILL.md is a markdown file the agent reads as its system prompt. It tells the agent, in plain English, which kubectl verbs to use and which to avoid:

Use only kubectl get, describe, logs, top, and events. Never run delete, apply, patch, or any write verb. Never read secrets — even a get on a secret object is forbidden. If you need to understand an object that you can only see through a write verb, stop and report instead of attempting it.

The agent reasons with that policy the same way it reasons with any other instruction. When it produces a command that breaks the rule, the next loop’s self-reflection flags it; when nullclaw (the tool-enforcement layer) lands, the same prose is the contract the structural enforcer will gate against. YAGNI says don’t build the structural gate until a second zombie needs the same policy — today the homelab zombie is the only one, so prose is the whole spec. secrets denial is called out explicitly because log exfiltration via secrets is the first thing a bad agent run would try. The prompt names it. That is the allowlist.

2. The agent never holds the kubeconfig

In most agent-plus-kubectl setups, the agent is a process with KUBECONFIG set in its environment — the LLM context, in theory, can be prompt-injected into echoing the token. This has been demonstrated against Claude Code and various self-hosted agents. It’s a real attack surface. In UseZombie, the agent process literally does not have the credential. What it has is a placeholder string — a random UUID that looks nothing like a token. When kubectl inside the sandbox makes an HTTPS call to the cluster API, a proxy at the network boundary catches it, swaps the placeholder for the real credential, and re-originates the request. The real token never enters the memory of the process the LLM is driving. Short-lived tokens in env vars still appear in prompt-injection exfiltration paths; placeholders don’t. The model can repeat the placeholder all day — it does nothing on its own.

3. The worker runs in your network, not ours

The control plane coordinates runs and stores audit logs. But the worker that actually executes tool calls runs on a box inside your homelab — a small Docker container on your k3s control plane node, a Pi behind your firewall, wherever you choose. The control plane never has a route to your k3s API. Your kubeconfig never leaves your network. If you pull the plug on the worker container, the whole thing stops.

Install

zombiectl install homelab

This scaffolds a homelab/ directory with exactly two files: SKILL.md (agent instructions + the read-only policy in prose) and TRIGGER.md (tool wiring, credential references, budget, network allowlist). No sub-skill directories. For the full end-to-end — worker placement, Tailscale integration, cred add — follow the Operator quickstart.

What’s next

v0.1 (current): read-only diagnosis. The output is a report with a proposed fix.
v0.2 (next): writes behind approval gates. The agent proposes a kubectl patch, pushes the proposal to your phone via Slack, and waits for a tap.

Feedback and issues welcome on GitHub. If you run k3s in a closet and want to be a design partner, open an issue.

Getting started

Zombies

Workspaces

Billing and cost control

Self-hosting

CLI reference

Updates

What it does

Three things that make it safe to run

1. The allowlist lives in prose, inside the SKILL.md prompt

2. The agent never holds the kubeconfig

3. The worker runs in your network, not ours

Install

What’s next

Getting started

Zombies

Workspaces

Billing and cost control

Self-hosting

CLI reference

Updates

​What it does

​Three things that make it safe to run

​1. The allowlist lives in prose, inside the SKILL.md prompt

​2. The agent never holds the kubeconfig

​3. The worker runs in your network, not ours

​Install

​What’s next

What it does

Three things that make it safe to run

1. The allowlist lives in prose, inside the SKILL.md prompt

2. The agent never holds the kubeconfig

3. The worker runs in your network, not ours

Install

What’s next