From issue to merged PR, while I do something else
I built this around something I believe: you should live in production. You label a GitHub issue ready, an agent implements it and opens a pull request, a second agent reviews and merges it. The work happens while I am doing something else. I have run it on a real app, an internal learning platform, for a while now: about 400 issues in, the site works.
It is not a product or a framework. It is a few bash scripts and two Claude Code agents on one always-on Linux box, coordinated entirely through GitHub labels. I am sharing it because it cost me very little and gave me a lot, and that ratio might be useful to you.
One property matters above the rest: it is opt-in. Nothing is touched unless you label an issue ready. You decide what is in scope, the machine does the typing.
Three pieces
GitHub itself is the bus between them.
A builder agent watches for ready issues, claims one, branches off fresh main, implements, and opens a pull request. It never merges.
A reviewer agent runs in a separate session, so it is not reviewing its own work. It watches for PRs marked ready-for-review, reviews them, keeps the docs and a design log in sync, and merges. It is the only thing that writes to main.
A supervisor is a plain bash loop under systemd. It reconciles desired state against reality every few minutes: clones missing, labels missing, sessions missing, all get recreated. It nudges an agent that has gone idle with work pending, and it pings my phone when something happens.
The two agents never talk to each other: the builder produces PRs, the reviewer consumes them, the hand-off is a label. That keeps the whole thing easy to reason about, and easy to pause at either end.
Labels are the control surface
There is no config UI and no queue: just labels on issues and PRs.
| label | meaning |
|---|---|
ready | work this. The only trigger. No label means ignored. |
wip | the builder has claimed it (assign + label), so nothing gets double-worked |
ready-for-review | the builder is done; the reviewer may merge |
epic | a parent issue whose sub-issues are batched as one unit |
hold | never auto-touch, even if ready (an override) |
auto:owned-by-<handle> | which operator runs this repo |
The lifecycle reads left to right: you add ready, the builder claims it with wip, opens a PR with ready-for-review, the reviewer merges. You stay in control by deciding what becomes ready.
A few choices that did the real work
Opt-in, not opt-out. I started by processing everything by default. That breaks the moment you file a few related issues at once: the builder grabs them one at a time and you lose any chance of treating them as a set. Gating on an explicit ready label lets you release a whole batch together, and unfinished thoughts sit harmlessly in the backlog.
Two independent agents. It is tempting to have one agent build and review. Do not: a reviewer that shares the builder's context will rationalize the builder's work. A separate session, in a separate clone, reviewing only the diff, catches more.
A dumb supervisor. It holds no state in its head. It reads a folder of repo config files and makes the world match. Crash, reboot, killed session, deleted clone: the next pass rebuilds it. That is what makes it survivable.
Wrapped builds. This one cost me the most to learn. Real builds on a small box, several at once, exhaust memory, and the kernel kills your agent session out from under you. The fix is a safe-build wrapper: one build at a time across the host, each inside a memory-capped cgroup. A runaway build now dies in its own cgroup instead of taking everything down.
How it has gone
Around 400 closed issues in, the platform works. What surprised me was the payoff for the effort: a handful of bash scripts and two agents, and it kept going. The best moments were live QA with the team: we sat together, used the app, and filed issues as we hit them. The agents picked them up and the fixes landed in production while we kept testing. Filing an issue and watching it come back a merged PR, without anyone breaking off to write it, is a good loop.
It is not magic and it is not hands-off. Four things mattered:
- You need to know what you are doing: this amplifies judgment, it does not replace it. Vague issues produce vague PRs.
- A well-structured project helps enormously: clear conventions and good existing patterns give the agents something to imitate.
- A framework helps for the same reason: we are on Nuxt with Nuxt UI, and the strong conventions mean the agents rarely invent their own way of doing things.
- Git is your friend: branch per issue, small PRs, everything reviewable and revertible. When something goes wrong, and it will, you want a clean history and an easy way back.
Caveats, please read
This works for me, but it is an experiment, not a hardened product. Weigh three things before pointing it at anything you care about.
The agents run with permissions bypassed and they merge to main on their own. That is the point, and it is the risk: they run commands and push code without asking. Only run this against repos and on a host where that is acceptable, and treat the ready label as the one place a human decides scope. I keep anything sensitive, security work or anything financial, off the ready list entirely.
It costs money and tokens: two agents running continuously consume real usage. Watch your limits.
Review what lands: an independent reviewer agent beats self-review, but it is still a model reviewing a model. Read main now and then. The design log the reviewer keeps makes that easier.
If any of that gives you pause, good. Start with one small repo, a couple of throwaway issues, and watch it before you trust it.
The full recipe
Everything you need to stand this up. Built for JavaScript and TypeScript repos on pnpm; swap the verify commands for other languages. Linux host, always-on (a small cloud VM is plenty; a laptop that sleeps will not do).
Prerequisites: git, tmux, curl, python3, flock, systemd-run, the GitHub CLI (gh auth login, with access to the target repos), the claude CLI authenticated (a Claude Code subscription works, or an API key), your build toolchain, and systemd user lingering (sudo loginctl enable-linger "$USER").
Everything lives under ~/ops/agent-supervisor/.
supervisor.sh:
#!/usr/bin/env bash
# Declarative reconciler + watchdog for all repos in repos.d/.
set -uo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}"
POLL="${SUPERVISOR_POLL:-240}"
mkdir -p "$DIR/logs"; GLOG="$DIR/logs/supervisor.log"
glog(){ echo "$(date '+%F %T') $*" >>"$GLOG"; }
notify(){ "$DIR/notify.sh" "$1" >/dev/null 2>&1; }
declare -A idle prev owner_alerted
LABELS=( "ready:0E8A16:eligible for builder" "wip:FBCA04:claimed by builder" \
"ready-for-review:1D76DB:reviewer may merge" "epic:5319E7:epic parent" \
"hold:B60205:do NOT auto-pick (overrides ready)" )
BUILD_NUDGE="Continue your BUILDER role: take the next ready, unclaimed, non-hold issue (claim with assign + wip), implement on a branch off fresh origin/main, open a ready-for-review PR. Never merge."
REVIEW_NUDGE="Continue your REVIEWER role: review the next ready-for-review PR; if it passes, update docs/design-log, then merge to main."
owns(){ local o; o=$(timeout 30 gh label list -R "$1" 2>/dev/null | awk '{print $1}' \
| grep '^auto:owned-by-' | grep -v "^$2\$" || true); [ -z "$o" ]; }
ensure_labels(){ local r="$1" ow="$2" s n c d; for s in "${LABELS[@]}"; do IFS=: read -r n c d <<<"$s"
timeout 30 gh label list -R "$r" 2>/dev/null | grep -q "^$n" || gh label create "$n" -R "$r" --color "$c" --description "$d" 2>/dev/null || true
done
timeout 30 gh label list -R "$r" 2>/dev/null | grep -q "^$ow" || gh label create "$ow" -R "$r" --color 0E8A16 --description "automation owner" 2>/dev/null || true; }
ensure_clone(){ [ -d "$2/.git" ] && return 0; glog "clone $1 -> $2"
git clone "[email protected]:$1.git" "$2" >/dev/null 2>&1 || glog "clone FAILED $1"; }
ensure_session(){ local sess="$1" dir="$2" role="$3" nudge="$4" t="$1:1.1" p
tmux has-session -t "$sess" 2>/dev/null && return 0
glog "session $sess missing -> launching"; notify "$sess was down, bringing it back up"
tmux new-session -d -s "$sess" -c "$dir"; sleep 2
tmux send-keys -t "$t" "claude --dangerously-skip-permissions" Enter; sleep 12
p=$(tmux capture-pane -p -t "$t" 2>/dev/null||echo "")
echo "$p"|grep -qi "trust this folder\|Yes, I trust" && { tmux send-keys -t "$t" Enter; sleep 4; p=$(tmux capture-pane -p -t "$t"||echo ""); }
echo "$p"|grep -qi "fullscreen renderer" && { tmux send-keys -t "$t" Down; sleep 0.3; tmux send-keys -t "$t" Enter; sleep 3; p=$(tmux capture-pane -p -t "$t"||echo ""); }
echo "$p"|grep -qi "Bypass Permissions mode" && { tmux send-keys -t "$t" Down; sleep 0.3; tmux send-keys -t "$t" Enter; sleep 4; }
sleep 3
tmux send-keys -t "$t" -l "Read your role file at $role and operate strictly per it. You were (re)started by the supervisor, resume your loop now. $nudge"
sleep 0.5; tmux send-keys -t "$t" Enter; }
watch(){ local sess="$1" short="$2" work="$3" nudge="$4" t="$1:1.1" busy=0 p
tmux capture-pane -p -t "$t" 2>/dev/null | grep -q "esc to interrupt" && busy=1
if [ "$work" = 0 ] || [ "$work" = "?" ]; then idle[$sess]=0
elif [ "$busy" = 1 ]; then idle[$sess]=0
else idle[$sess]=$(( ${idle[$sess]:-0} + 1 ))
if [ "${idle[$sess]}" -ge 2 ]; then tmux send-keys -t "$t" -l "$nudge"; sleep 0.4; tmux send-keys -t "$t" Enter
notify "[$short] $sess idle, $work pending, nudged"; idle[$sess]=0; fi
fi
p="${prev[$sess]:-}"; [ -n "$p" ] && [ "$work" != "$p" ] && [ "$work" != "?" ] && notify "[$short] $sess pending: $work (was $p)"; prev[$sess]="$work"; }
glog "=== supervisor started ==="; notify "supervisor up, reconciling repos.d/"
while true; do
for f in "$DIR"/repos.d/*.repo; do
[ -e "$f" ] || continue
REPO=""; OWNER_LABEL=""; CLONE_BUILD=""; CLONE_REVIEW=""; ENABLED=true
source "$f"; [ "${ENABLED:-true}" = true ] || continue; [ -n "$REPO" ] || continue
short="${REPO##*/}"
[ -z "$CLONE_BUILD" ] && CLONE_BUILD="$HOME/$short"
[ -z "$CLONE_REVIEW" ] && CLONE_REVIEW="$HOME/$short-review"
if ! owns "$REPO" "$OWNER_LABEL"; then [ -z "${owner_alerted[$REPO]:-}" ] && { notify "SKIP $REPO, a different owner label is present"; owner_alerted[$REPO]=1; }; continue; fi
ensure_labels "$REPO" "$OWNER_LABEL"
ensure_clone "$REPO" "$CLONE_BUILD"; ensure_clone "$REPO" "$CLONE_REVIEW"
ensure_session "$short-build" "$CLONE_BUILD" "$DIR/roles/builder.md" "$BUILD_NUDGE"
ensure_session "$short-review" "$CLONE_REVIEW" "$DIR/roles/reviewer.md" "$REVIEW_NUDGE"
bw=$(timeout 30 gh issue list -R "$REPO" --label ready --state open --json number,assignees,labels \
-q "[.[]|select((.assignees|length)==0 and ((.labels|map(.name)|index(\"hold\"))==null))]|length" 2>/dev/null||echo "?")
rw=$(timeout 30 gh pr list -R "$REPO" --label ready-for-review --state open --json number -q "length" 2>/dev/null||echo "?")
watch "$short-build" "$short" "$bw" "$BUILD_NUDGE"
watch "$short-review" "$short" "$rw" "$REVIEW_NUDGE"
done
sleep "$POLL"
done
bin/safe-build:
#!/usr/bin/env bash
# Run a build/test command serialized host-wide and inside a memory-capped cgroup.
# Usage: safe-build <command...> e.g. safe-build pnpm build
set -uo pipefail
export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}"
LOCK="/tmp/agent-build.lock"
[ "$#" -eq 0 ] && { echo "usage: safe-build <command...>" >&2; exit 2; }
WD="$(pwd)"
echo "[safe-build] waiting for host build lock (one build at a time)..."
exec 9>"$LOCK"; flock 9
echo "[safe-build] running: $* (cap=${SAFE_BUILD_MEM_MAX:-6G}) in $WD"
systemd-run --user --scope --quiet \
-p MemoryMax="${SAFE_BUILD_MEM_MAX:-6G}" -p MemorySwapMax="${SAFE_BUILD_SWAP_MAX:-3G}" \
-E PATH="$PATH" -E HOME="$HOME" -- env -C "$WD" "$@"
rc=$?; echo "[safe-build] done (exit $rc)"; exit $rc
notify.sh and config.env:
#!/usr/bin/env bash
# Portable notifier. NOTIFY_CHANNEL in config.env: ntfy | slack.
set -uo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "$DIR/config.env" ] && source "$DIR/config.env"
MSG="$*"; [ -z "$MSG" ] && { echo "usage: notify.sh <message>" >&2; exit 1; }
case "${NOTIFY_CHANNEL:-ntfy}" in
ntfy) [ -n "${NTFY_TOPIC:-}" ] && curl -fsS -H "Title: ${NTFY_TITLE:-supervisor}" \
-d "$MSG" "https://ntfy.sh/$NTFY_TOPIC" >/dev/null 2>&1 && exit 0 ;;
slack) [ -n "${SLACK_WEBHOOK:-}" ] && curl -fsS -X POST -H "Content-type: application/json" \
--data "$(printf '%s' "$MSG" | python3 -c 'import json,sys;print(json.dumps({"text":sys.stdin.read()}))')" \
"$SLACK_WEBHOOK" >/dev/null 2>&1 && exit 0 ;;
esac
exit 1
# config.env
NOTIFY_CHANNEL="ntfy" # ntfy | slack
NTFY_TOPIC="change-me-to-a-private-random-string"
NTFY_TITLE="build-supervisor"
SLACK_WEBHOOK=""
roles/builder.md:
# Role: BUILDER, turn READY issues into reviewable PRs. NEVER merge.
Ownership guard: only operate if the repo carries its owner label (auto:owned-by-<handle>).
If a DIFFERENT owner label is present, stop and report.
Eligibility, work an issue ONLY if: open, labeled `ready`, NOT `hold`, unassigned, no `wip`,
no existing open PR/branch. `hold` overrides `ready`. Un-`ready` issues are ignored. If none
eligible, idle and re-check.
Loop:
1. Pick the next eligible issue.
2. Scope: standalone, solo branch+PR. Coupled epic (`epic` parent / shared label / shared-file
overlap), batch the whole epic as a frozen unit (one branch+PR if modest, else a stacked-PR
chain); only batch ready, non-hold sub-issues.
3. Claim everything you will work: assign + add `wip`. Never touch an already-claimed or `hold` issue.
4. Branch off FRESH main: `git fetch origin && git switch -c issue-N-slug origin/main`.
Read the repo's CLAUDE.md/AGENT.md/README + recent docs/design-log.md first.
5. Implement. Verify with `pnpm lint` + `npx tsc --noEmit` + `pnpm test` where present. Run the full
build ONLY via `safe-build pnpm build`, never `pnpm build` directly. (Other languages: swap these.)
6. If you changed documented behavior, update the affected docs in the SAME PR.
7. Before handing off, re-scan for ready siblings that belong to this batch; fold trivial ones in.
8. Open the PR (closes the issue(s)), remove `wip` (keep assigned), add `ready-for-review`. Do NOT merge.
roles/reviewer.md:
# Role: REVIEWER / MERGER / SCRIBE, independent of the builder. SOLE writer to main.
Eligibility, act ONLY on PRs labeled `ready-for-review`. If none, idle and re-check.
Loop:
1. Pick the next `ready-for-review` PR not already being processed.
2. Review adversarially: correctness, regressions, scope creep, security. Pull the branch and run
`pnpm lint` + `npx tsc --noEmit` + `pnpm test` (full build via `safe-build pnpm build`).
3. Doc-consistency gate: if the change alters documented behavior and the docs were not updated in
the PR, fix them before merging (accurate + minimal).
4. Changes needed, leave review comments, remove `ready-for-review` (builder re-adds when fixed),
skip. Otherwise continue.
5. Merge against latest main: `git fetch origin`; if behind, merge main in and resolve conflicts
(you are the sole writer). Merge an epic/stack as a unit, bottom-up.
6. Record (commit WITH the merge): append a terse `docs/design-log.md` entry (decision, why,
alternatives rejected, consequence); update conventions docs only if a convention changed.
7. Merge, delete branch, confirm issue(s) closed.
Never leave main broken. design-log is append-only + terse.
~/.config/systemd/user/agent-supervisor.service:
[Unit]
Description=Autonomous build/review supervisor
After=network-online.target
[Service]
Type=simple
ExecStart=%h/ops/agent-supervisor/supervisor.sh
Restart=always
RestartSec=10
Environment=PATH=/usr/local/bin:/usr/bin:/bin
[Install]
WantedBy=default.target
Install and start:
cd ~/ops/agent-supervisor
chmod +x supervisor.sh notify.sh bin/safe-build
sudo ln -sf "$PWD/bin/safe-build" /usr/local/bin/safe-build
# edit config.env (set NTFY_TOPIC), then test the channel:
./notify.sh "hello from $(hostname)"
systemctl --user daemon-reload
systemctl --user enable --now agent-supervisor
Add a repo with one file, repos.d/<name>.repo:
REPO="ORG/repo"
OWNER_LABEL="auto:owned-by-<yourhandle>"
ENABLED=true
# optional, default to ~/<name> and ~/<name>-review:
# CLONE_BUILD="$HOME/<name>"
# CLONE_REVIEW="$HOME/<name>-review"
Within a few minutes the supervisor creates the labels, clones the repo into a build tree and a separate review tree, and starts the two agent sessions. Then you label an issue ready and watch it flow.
That is the whole thing. If it saves you some typing, good.