[{"data":1,"prerenderedAt":1083},["ShallowReactive",2],{"work-\u002Fwork\u002Fanton":3,"writing-about-anton":51},{"id":4,"title":5,"body":6,"demo":30,"description":31,"extension":32,"featured":33,"image":34,"meta":35,"name":5,"navigation":33,"path":36,"seo":37,"stem":38,"summary":39,"tags":40,"url":47,"when":48,"where":49,"__hash__":50},"work\u002Fwork\u002Fanton.md","Anton",{"type":7,"value":8,"toc":26},"minimark",[9,18],[10,11,12,13,17],"p",{},"Anton is a personal agent OS, deployed on a DGX Spark on my home network. About ten specialized domain agents (home, media, coder, research, admin, knowledge, syndic, dev, quality) coordinate via typed delegates. Skills run in a sandboxed Deno runtime, hot-reloadable. Layered memory (context, history, facts), policy engine evaluating every action, thread-aware interruption with cancel cascade, nightly self-improvement loop driven by issues, evaluation infrastructure under ",[14,15,16],"code",{},"packages\u002Fagent-quality",". Local LLM inference via vLLM behind a LiteLLM gateway over Tailscale.",[10,19,20,21,25],{},"Anton is a decade-later realization of ",[22,23,24],"em",{},"Bots for Humanity",", a personal-agent \u002F digital-twin concept I worked on as my MBA capstone in 2015 and presented to contacts at Facebook. Design philosophy: ingest cheap, process lazy, enrich on demand. Explore agentic, build deterministic.",{"title":27,"searchDepth":28,"depth":28,"links":29},"",2,[],null,"Anton is a personal agent OS, deployed on a DGX Spark on my home network. About ten specialized domain agents (home, media, coder, research, admin, knowledge, syndic, dev, quality) coordinate via typed delegates. Skills run in a sandboxed Deno runtime, hot-reloadable. Layered memory (context, history, facts), policy engine evaluating every action, thread-aware interruption with cancel cascade, nightly self-improvement loop driven by issues, evaluation infrastructure under packages\u002Fagent-quality. Local LLM inference via vLLM behind a LiteLLM gateway over Tailscale.","md",true,"\u002Fwork\u002Fanton.png",{},"\u002Fwork\u002Fanton",{"description":31},"work\u002Fanton","Personal agent OS that runs my family in production. Custom TypeScript runtime, sandboxed Deno skills, layered memory, policy engine, nightly self-improvement loop.",[41,42,43,44,45,46],"agents","typescript","deno","ai","self-hosted","infrastructure","https:\u002F\u002Fgithub.com\u002Flucbocahut\u002Fanton","2024-12-01","Paris","2f-67YAX0igzpgv35vGGYxy1_1oMWhP04me-AzX3Tmc",[52,143,238,311,406,540,680,761,979],{"id":53,"title":54,"body":55,"canonical_url":30,"date":133,"description":59,"extension":32,"meta":134,"navigation":33,"path":135,"seo":136,"series":137,"stem":138,"summary":139,"tags":140,"work_slug":141,"__hash__":142},"writing\u002Fwriting\u002F2026-03-07-anton-01-genesis.md","Anton, chapter 1: Genesis",{"type":7,"value":56,"toc":127},[57,60,71,76,83,87,94,98,101,104,107,111,118,121,124],[10,58,59],{},"I have a simple idea: build an assistant more reliable and secure than OpenClaw, which I find frustrating and terrifying at the same time. What better use case than a family assistant that handles the complexities of running a family of 6. If I can reliably replace myself for some common chores then I've won. What's more, I have this DGX Spark sitting at home that seems like the perfect host for my new assistant. I call him Anton. In reference to the series Silicon Valley.",[10,61,62,63,66,67,70],{},"The day starts where I tend to start: with a contract, not code. By 09:25 the repo has a pnpm + Turborepo monorepo, TypeScript throughout, empty ",[14,64,65],{},"apps\u002F"," and ",[14,68,69],{},"packages\u002F",", an implementation plan, and a shared types package as the central contract surface. Nothing that runs. But the shape of what runs is written down first. I always do this. Agreeing with yourself on the interfaces upfront is dirt cheap; refactoring three weeks later is not.",[72,73,75],"h2",{"id":74},"the-household","The household",[10,77,78,79,82],{},"Then the household map. WhatsApp as the entry point because that's where the family already lives. The Spark over Tailscale as the host, because everything about Anton (his memory, his data, his voice) is going to be physically close to me, not in someone else's cloud. The TNAS, Plex, Transmission, the ",[14,80,81],{},"gws"," CLI for Google Workspace: all the things he needs to be useful. Docker Compose with three services: Postgres, WhatsApp, Worker. A deploy script that rsyncs the lot to the Spark. No staging, no laptop. I want to be in production from hour one. The real thing.",[72,84,86],{"id":85},"identity-as-data","Identity as data",[10,88,89,90,93],{},"Then a question I've been chewing on for weeks: identity. Is Anton a tool or a character? I don't want a system prompt baked into a string somewhere in the code. I want the personality to be a document (",[14,91,92],{},"identity.md",") loaded at runtime. Anton's personality and my context as data, not code. The reason is practical more than philosophical: if I change my mind about who Anton is, or my family does, or a kid moves out, I want to edit a file, not push code. Treating context as data feels like the kind of decision that compounds.",[72,95,97],{"id":96},"afternoon-capabilities","Afternoon capabilities",[10,99,100],{},"Afternoon is capabilities, three commits between 16:00 and 17:30. Media first: Plex search, Transmission control. Then calendar, Google Workspace, voice transcription. I almost cut voice. I'm not sure the family will actually use it, and the day's budget is tight. In the end I leave it in, on the principle that when you have room to integrate a maybe, you integrate it. The cost of finding out is the same as the cost of guessing wrong, and the upside is asymmetric.",[10,102,103],{},"I also add a memory layer. Deliberately simple: Postgres rows tagged by user. No embeddings, no fancy retrieval. The rule I have in mind is to build the simplest version that lets me see what the system actually wants, then design for that. Memory will need real retrieval semantics eventually, but I don't know what shape yet, and the worst thing I can do is guess.",[10,105,106],{},"For the orchestration question (how the parent agent should route between domains) I pick LangGraph. It's a great choice. Structure, an observable state machine, checkpointing, a clean way to express subgraphs per domain with classify-and-dispatch at the top. Reliable, traceable, well thought out. I'm glad to have a framework that already has answers for problems I haven't yet hit.",[72,108,110],{"id":109},"first-production-bug","First production bug",[10,112,113,114,117],{},"Then the first production bug, four hours in. WhatsApp is throwing \"bad encryption\" errors on incoming messages (a Baileys quirk). Five minutes reading the library, the fix is wiring ",[14,115,116],{},"getMessage"," for retry decryption. What I notice isn't the fix. It's that the system is real enough by mid-afternoon to throw production bugs in the first place. Stub systems don't fail like that.",[10,119,120],{},"Evening is orchestration. BullMQ + Redis schedule queue. And then a refactor I'm framing as a cleanup: a commit that makes WhatsApp a thin shim that enqueues jobs, and lets the worker own all the agent logic. It just feels right to keep transport-layer concerns out of the agent. Cleaner that way.",[10,122,123],{},"By midnight Anton can take a WhatsApp message, route it through Postgres-tracked history, dispatch to a domain, run scheduled jobs from the database, self-update via an API endpoint, and redeploy from one shell script. Usable end to end.",[10,125,126],{},"I go to bed satisfied. The system is real, deployed on the Spark, talking to the family group on WhatsApp, with seven domains' worth of subgraphs registered and a quiet schedule queue waiting for work. It has the shape I wanted going in: a personal assistant living close to my data, on my hardware, with a personality I can edit as a file.",{"title":27,"searchDepth":28,"depth":28,"links":128},[129,130,131,132],{"id":74,"depth":28,"text":75},{"id":85,"depth":28,"text":86},{"id":96,"depth":28,"text":97},{"id":109,"depth":28,"text":110},"2026-03-07",{},"\u002Fwriting\u002F2026-03-07-anton-01-genesis",{"title":54,"description":59},"anton-journey","writing\u002F2026-03-07-anton-01-genesis","Building Anton, a personal agent OS for my family on a DGX Spark, day one.",[141,41,44],"anton","joDG0mX9JDmKMkdmV0TLA8ToxC2iEqCEtvgDfVrzvpU",{"id":144,"title":145,"body":146,"canonical_url":30,"date":230,"description":150,"extension":32,"meta":231,"navigation":33,"path":232,"seo":233,"series":137,"stem":234,"summary":235,"tags":236,"work_slug":141,"__hash__":237},"writing\u002Fwriting\u002F2026-03-08-anton-02-first-weekend.md","Anton, chapter 2: The first weekend",{"type":7,"value":147,"toc":223},[148,151,155,158,162,168,171,175,190,194,197,200,203,207,214,217,220],[10,149,150],{},"I wake up Saturday morning and the first thing I want to fix is the parent. The classify-and-dispatch graph from yesterday does the job, but it asks the LLM to make routing decisions inside a state machine that's already trying to do the routing itself. Two layers fighting over the same job. I want one.",[72,152,154],{"id":153},"agent-with-tools","Agent with tools",[10,156,157],{},"The first commit of day two rips the parent out and rewrites it as an agent-with-tools: a single LLM with all the subgraphs exposed as tools, choosing what to call from natural language. The classify-then-route pattern stays useful for individual domain classifiers, but the parent is done with it. By breakfast the system feels lighter. The LLM at the top is doing what it's good at (picking the right tool for the job), and the framework underneath is doing what it's good at (holding everything else).",[72,159,161],{"id":160},"family-grade-polish","Family-grade polish",[10,163,164,165,167],{},"The next handful of commits are the small things that turn an assistant into something the family can actually use. Typing indicator and a startup message so people see acknowledgement before the LLM finishes thinking. Language matching: Anton replies in whatever language the current message is in, not the historical thread language. French and English freely mixed in this house. The ",[14,166,81],{}," Google Workspace auth gets an onboarding guide so someone other than me can wire it up. Schedules become natural language: \"remind me to X every morning\" is just an Anton command now, not a separate language. Group chat support, daily summaries, conversation browsing. The honesty guidelines land too: Anton should never make things up to sound competent.",[10,169,170],{},"The first real browser-driven domain lands the same day: Doctolib login with 2FA. The interactive input queue lets Anton ask the user mid-flow for the SMS code, then resume where he left off. It's a small mechanism. It feels right immediately, in the way something does when it solves a class of problem you didn't quite know how to name yet.",[72,172,174],{"id":173},"saturday-observability","Saturday observability",[10,176,177,178,181,182,185,186,189],{},"Saturday night is observability. Three commits between 22:30 and midnight. A debug→issue pipeline so error traces auto-draft GitHub issues, because I'd rather have the issue file itself than wake up Sunday to no record of what broke. A ",[14,179,180],{},"\u002Flogs"," endpoint backed by an in-memory ring buffer instead of ",[14,183,184],{},"docker logs",", because reaching into Docker every time I want to see what's happening is friction that adds up. And the trigger-file deploy protocol: instead of ",[14,187,188],{},"\u002Fupdate"," shelling out synchronously, a systemd watcher polls for a trigger file, runs the deploy, reports back. Decoupled, boring, reliable. The kind of plumbing that disappears the moment it works.",[72,191,193],{"id":192},"the-lcars-dashboard","The LCARS dashboard",[10,195,196],{},"Sunday morning is the LCARS dashboard. Star Trek themed Nuxt UI in one commit. Service probes, log viewer, conversation browser, mobile-first layout. The next nine commits are mostly the UI fixing itself: proxy semantics, env prefix, mobile layout, trace viewer, modal overlay parsing checkpoints, expandable drill-down. By lunch I can look at any conversation, drill into any agent run, and see what the LLM is thinking at each step. The trace viewer is what I'll lean on for the rest of the weekend. Without it, the next twelve hours don't happen.",[10,198,199],{},"Then knowledge surface expansion in two hours. Anton can grep his own source code now. He can do live web research with Grok. Research becomes its own subgraph with budget control, because I don't want it living inside other domain agents (research has its own concerns: budget, citation, fact-check). Web browsing and a document knowledge store. Google OAuth wired through the worker with a UI settings page. By Sunday afternoon Anton can read the web, read his own code, and remember what he reads.",[10,201,202],{},"Sunday evening: the quality test suite. 28 test cases across all domains. This is the inflection point where shipping changes stops being \"did the WhatsApp message look right\" and starts being \"did the regression battery still pass.\" I should have done this on day one. I didn't, because day one was already too full. The gap between not having a test suite and having one is the gap between hoping and knowing.",[72,204,206],{"id":205},"the-calendar-saga","The calendar saga",[10,208,209,210,213],{},"The next ten hours are the suite finding bugs and the bugs getting fixed. The big one is the calendar saga. Six commits chasing the same failure mode: the calendar agent keeps producing wrong answers when I ask it to do anything multi-step (\"delete the event titled X\"). Two root causes. First, the parent's full conversation history is being passed into the calendar agent, contaminating its context with everything else that's been said. Second, the LLM can't reliably chain a search-then-delete in one go: it does the search, returns the result, and stops. The fix on the first is a strict rule: don't pass conversationHistory to domain agents. The fix on the second is structural: don't rely on the LLM to chain multi-step tool calls; build composite skills (one ",[14,211,212],{},"findAndDeleteEvent"," instead of two separate tools). Both rules go into MEMORY.md the same evening. They're the kind you only learn by getting burned.",[10,215,216],{},"Sunday night through Monday morning, the domains broaden. A wine collection lands as a typed table, the first user-facing collection. School messages domain. Group images and audio when @mentioned, with an \"ingest cheap, process lazy\" pattern: store the raw blob, only invoke vision or transcription when someone actually @-asks for it. Smarter media download flow with release preferences and stop\u002Fresume. Torrent searches routed through Tor SOCKS proxy, the only network egress decision driven by operational caution rather than feature need.",[10,218,219],{},"By Monday morning Anton has 73 quality tests, seven domains, a UI that drills into trace checkpoints, and a dev loop I can actually trust: trace viewer plus quality suite plus auto-issue pipeline plus ring-buffer logs plus LCARS. I can change anything and see what breaks.",[10,221,222],{},"The weekend's lesson, the one I'm taking into next week: write the tests before the bugs do. The calendar saga cost me an evening of debugging that the suite would have caught in seconds. From now on, every domain ships with regression coverage. Not because I want to be disciplined. Because I've now experienced the cost of not being.",{"title":27,"searchDepth":28,"depth":28,"links":224},[225,226,227,228,229],{"id":153,"depth":28,"text":154},{"id":160,"depth":28,"text":161},{"id":173,"depth":28,"text":174},{"id":192,"depth":28,"text":193},{"id":205,"depth":28,"text":206},"2026-03-08",{},"\u002Fwriting\u002F2026-03-08-anton-02-first-weekend",{"title":145,"description":150},"writing\u002F2026-03-08-anton-02-first-weekend","A first full weekend of building turns Anton into something the family can actually use.",[141,41,44],"sUIp73ttugbZ0i1HqI2logHtd14nhBXW7ig0vP2e__g",{"id":239,"title":240,"body":241,"canonical_url":30,"date":303,"description":245,"extension":32,"meta":304,"navigation":33,"path":305,"seo":306,"series":137,"stem":307,"summary":308,"tags":309,"work_slug":141,"__hash__":310},"writing\u002Fwriting\u002F2026-03-13-anton-03-domains-widen.md","Anton, chapter 3: Domains widen, browser hardens",{"type":7,"value":242,"toc":296},[243,246,250,253,256,260,263,266,270,273,277,280,284,287,290,293],[10,244,245],{},"The week opens on a piece of unfinished business. Doctolib search works. Detail enrichment does not. I want to drill into the appointment detail page from the search results so Anton can tell the family what's actually available, not just that something exists. The first attempt clicks back into the result. The second switches to direct navigation to avoid stale handles. The third adds diagnostic logging, then a 60 second budget, then per-page timeouts. Each fix surfaces the next failure. By mid-morning the answer is unambiguous: Cloudflare is fingerprinting the browser as a bot and blocking the navigation entirely. I disable detail enrichment and file the issue. The cleverness has to move somewhere else.",[72,247,249],{"id":248},"persistent-profiles","Persistent profiles",[10,251,252],{},"The unlock comes from changing the question. Instead of asking \"how do I get into the detail page\", I ask \"what does a real browser look like that this one doesn't\". The answer is persistent profiles. A real Chromium, not the bundled headless one, running in a profile directory that keeps cookies, history, and the small thousand fingerprints that accumulate over time. Once the browser is allowed to act like a browser, Cloudflare stops fingerprinting it. Anti-detection flags help at the margins, but the real fix is identity continuity: a session the site recognizes as a returning human, not a fresh anonymous request from nowhere.",[10,254,255],{},"Then the second move, which I like more. I rewrite enrichment to skip the detail page entirely. The list page already contains most of what we want, in unstructured text. So I feed the list page text to a single LLM call and ask for the structured fields back. One call. Cheaper than per-card DOM traversal, faster, and it doesn't fight the site. The lesson worth keeping: when the deterministic path costs you a battle with the host, lift the work up one level and let the model read the raw text. The LLM is the cheapest unit of work I have. I should be using it where it pays.",[72,257,259],{"id":258},"lifting-work-to-the-llm","Lifting work to the LLM",[10,261,262],{},"The browser also needs to live somewhere. By midweek it lands in its own service container, dedicated, isolated, with the persistent profile mounted as a volume. CDP wiring takes a handful of commits to settle (a TCP proxy because Chromium binds where it likes, an ESM import detail, the WebSocket URL, stale lock cleanup). When it's done, every browser-touching domain (Doctolib, the syndic site, the consulate, generic web) shares the same hardened browser. One container, one profile per site, one place to fix things.",[10,264,265],{},"Alongside the browser saga, the media subsystem grows the features that turn it from a toy into something the family actually uses. The bag of media tools gets redesigned around an intent-driven request shape with four tools (status, library, watch, search). The watchlist gets follow and unfollow and a triage job that scans the catalog and tells you what's worth attention. The validator learns to respect the scope of a request so it stops retrying things that were never in scope. Show triage starts as \"latest season, 14 day lookback\" and then becomes \"the last episode Plex actually has\", which is a small but characteristic move: stop reasoning from arbitrary windows, reason from state.",[72,267,269],{"id":268},"movie-night","Movie night",[10,271,272],{},"The headline media feature is movie night. A scheduled job, Friday at six, that picks two or three movies for the family and posts them to the group. It's the first proactive message Anton sends. Not a reply, not an answer to a question, but an unsolicited suggestion at a fixed time. Six iterations of prompt refinement in two hours to land the tone. The early drafts close with \"want me to download any of these?\", which nobody asked for, and which makes the message read like a salesman. The scheduled-output rules go into the system prompt that same afternoon: no follow-up suggestions, no filler, follow the spec. A scheduled message arrives on its own terms or it doesn't arrive at all.",[72,274,276],{"id":275},"pluggable-domains","Pluggable domains",[10,278,279],{},"The 13th's evening commit is the one I'm proudest of structurally. Domain tools get refactored into pluggable modules. Each domain registers itself with a small definition shape; the parent agent's tool surface is built dynamically from whatever modules are present. The parent stops knowing about specific domains. It just knows it has tools. Two days later the syndic domain (the building management portal) gets registered as a new domain in three lines. Three. That's the kind of moment that tells you the abstraction was the right one. When the marginal cost of a new domain drops to nothing, you've found the seam.",[72,281,283],{"id":282},"collections-substrate","Collections substrate",[10,285,286],{},"Then the collections substrate. Wines landed earlier as a typed table, the first user-facing collection. By the 15th the pattern is going to repeat: contacts, books, restaurants, anything else the family wants to remember. Rather than write a typed table per collection forever, I land a generic collections table backed by JSONB items, with one set of tools (add, search, update) that works for everything. The trick that makes generic tools actually usable is putting the collection's field schema into the tool description itself. The LLM reads the description, knows what shape \"wine\" items take versus \"restaurant\" items, and adapts. Wines get migrated as the first use case. The typed table goes away. One substrate, many collections.",[10,288,289],{},"Skills v0 lands the same week. A skills table in the database, an admin command set, a small UI listing, seed data. At this point \"skill\" means saved prompt: a reusable command template like \"weekly review\" or \"write a Google Doc with this structure\". The point is to stop pasting the same long instructions into chat over and over and start treating them as named, reusable, edit-in-the-database artifacts. There's an unfortunate terminology overlap with the typed-function skills package, which I'll have to clean up. For now the value is real: a prompt I want to keep is a row I can edit, not a string I have to track down.",[10,291,292],{},"Smaller things that earn their place: a weather skill backed by Open-Meteo, because asking Anton about the weather should not require a detour through web search. Google Docs creation with auto-sharing, because the family already lives in Drive. Collection lifecycle tests added to the quality suite, because everything that ships now ships with regression coverage (the rule from last weekend has become a habit). And a Clara-specific tone in the system prompt: simpler responses, escalate to me when needed. The first user-aware customization. Anton talks differently to different people in the same household, which is what any decent assistant should do.",[10,294,295],{},"The week closes with the architecture lighter than it started. The browser is its own container with a real profile and anti-detection that actually works. Domains are pluggable, and adding one is a registration, not a fork. Collections are generic. Skills are data. The instinct underneath all of it is the same one I keep coming back to: when something is going to repeat, make it a substrate, not a special case. Every time I've made that choice this week, the system got smaller and the next feature got cheaper. That's the trade I want to keep making.",{"title":27,"searchDepth":28,"depth":28,"links":297},[298,299,300,301,302],{"id":248,"depth":28,"text":249},{"id":258,"depth":28,"text":259},{"id":268,"depth":28,"text":269},{"id":275,"depth":28,"text":276},{"id":282,"depth":28,"text":283},"2026-03-13",{},"\u002Fwriting\u002F2026-03-13-anton-03-domains-widen",{"title":240,"description":245},"writing\u002F2026-03-13-anton-03-domains-widen","Hardening the browser, lifting the work to the LLM, and turning domains into pluggable substrates.",[141,41,44],"ZQWYip4ASEQwXHKjurXg1AGncN25gYIpOJw4XKIY7nw",{"id":312,"title":313,"body":314,"canonical_url":30,"date":398,"description":318,"extension":32,"meta":399,"navigation":33,"path":400,"seo":401,"series":137,"stem":402,"summary":403,"tags":404,"work_slug":141,"__hash__":405},"writing\u002Fwriting\u002F2026-03-16-anton-04-plumbing-matures.md","Anton, chapter 4: Plumbing matures",{"type":7,"value":315,"toc":391},[316,319,323,326,330,333,336,340,354,358,365,372,375,378,381,385,388],[10,317,318],{},"The week opens with one line of config and a quiet decision: the local inference path moves off Ollama onto vLLM. One environment variable changes, and the serving stack underneath changes with it. Real concurrency, proper batching, something that can actually take production load. Ollama got me through the first two weeks; vLLM is what I want sitting under a system that several people are going to lean on every day. The kind of swap that looks trivial in a diff and reshapes everything that runs on top of it.",[72,320,322],{"id":321},"memory-rebuild","Memory rebuild",[10,324,325],{},"Then memory. The simple Postgres-rows-tagged-by-user store from week one is fine until it isn't, and around now it isn't. I rebuild it around proper retrieval semantics. pg_trgm scored matching instead of vector embeddings, because at personal-knowledge-base scale trigrams are enough and an embedding pipeline is a tax I don't want to pay. Provenance tags so every fact can be traced to its source. Domain-biased retrieval: a calendar query weights calendar facts, a media query weights media facts. An adaptive recall window, short for chit-chat and long for things that look like research. A working-memory scratchpad for the live conversation. An LLM pass that paraphrases the query before searching, because the way a question is asked is rarely the way the answer is stored. A few days later I add an episodic layer, summaries of past conversations, but only injected when the query has temporal intent. Always-on episodic memory is context bloat dressed up as helpfulness. The principle that crystallises out of all this is one I keep coming back to: filter at load time, not write time. Store generously, retrieve narrowly.",[72,327,329],{"id":328},"schedules-and-codenames","Schedules and codenames",[10,331,332],{},"Schedules go from a thin BullMQ wrapper to an actual domain. Timezone support, daily-duplicate merging, an execution log, on-demand run, silent mode, a UI that doesn't make me wince. Scheduling deserves to be a first-class citizen because half of what I want Anton to do is recurring: the morning briefing, the weekly review, the reminder to call my godfather. In the same commit LiteLLM lands as the gateway in front of every model provider, because the schedule plus agent combination needs one place to route LLM calls, not many. With LiteLLM comes the codename roster: sunny, haiku, oscar, gandalf, gizmo, gatsby, merlin, gustav. Names instead of provider IDs. It feels mildly silly the first time I type \"ask gandalf\" in a config and it sticks immediately. Models get swapped, deprecated, repriced; the codename is stable. The indirection costs nothing and pays back every time a provider changes something.",[10,334,335],{},"A naming sin from last week catches up with me. I'd called two different things \"skills\": a typed function in code and a reusable prompt template in the database. Two commits clean it up. First, \"skills\" become \"saved prompts\" everywhere. Then I drop the \"saved\" prefix because it adds nothing. Now skill means a typed function with a runtime contract, prompt means a template stored in the database. The cost of the cleanup is a couple of hours; the cost of letting the collision live another month would have been far worse. Naming is one of those things where the bill compounds.",[72,337,339],{"id":338},"skill-contracts","Skill contracts",[10,341,342,343,346,347,66,350,353],{},"The runtime contract for skills lands the same day as ",[14,344,345],{},"defineSkill()",". Every skill declares its inputs, outputs, scopes, and handler in one shape, and the skill-runner becomes its own service: a separate container hosting skills as individually deployable units, callable over HTTP. I split it out for four reasons. Hot reload, so a single skill updates without restarting the world. Per-skill metrics: invocation counts, error rates, p50 and p95. Scope isolation, because a leaf skill should not need the parent agent's whole context surface to do one job. And the option of sandboxing later, which is much cheaper to add to a service that's already separate than to one tangled into the worker. A day later every domain unifies onto ",[14,348,349],{},"callSkill()",[14,351,352],{},"createSkillTool()",". One contract, one entry point, one way to add a new capability.",[72,355,357],{"id":356},"traces-and-permissions","Traces and permissions",[10,359,360,361,364],{},"Traces become first-class. The trace viewer from week one was reading checkpoints out of Redis, which is fine for live debugging and useless for anything historical. I move execution traces into Postgres: every agent run is a row in ",[14,362,363],{},"agent_traces",", queryable from the UI, surviving failures and retries. The next day a small follow-up locks down the invariant: one trace per request, no matter what. Now I can ask questions of the system's own behaviour. What happened in this run, last night, last week. The dev loop tightens again.",[10,366,367,368,371],{},"Permission filtering moves into ",[14,369,370],{},"runAgent()"," itself, which closes a class of bugs I keep almost shipping: a scheduled job inheriting wildcard permissions because the filter lived in the wrong place. Every caller (parent agent, scheduled job, mesh probe, the \u002Finvoke endpoint) gets the same filter applied at the same point. The right place to enforce a rule is the place every path has to go through.",[10,373,374],{},"The Invoke tab grows a VueFlow graph that draws the live agent architecture from the runtime configs. It is the first time I can look at Anton and see his shape, not just his logs. Agents on the canvas, tools as edges, the whole thing redrawing as I add or remove a domain. The visualisation pushes a mental model into my head before it's anywhere in the code: agents are the organising principle. Everything else hangs off them.",[10,376,377],{},"Then a refactor I've been wanting for a while. The parent agent had 63 tools attached to it and had started making the kind of mistakes you make when there's too much choice on the table. I restructure it into 10 subsystem delegates, each a small agent of its own. The parent's job becomes routing and synthesis, not calling everything directly. Six times fewer tools at the top level, and the answers get sharper immediately. It's the same lesson as last weekend's calendar saga, scaled up: the LLM is good at picking the right thing from a small menu and bad at picking the right thing from a long one. Give it a small menu.",[10,379,380],{},"Output validation lands in the agent loop, with intermediate messages and deterministic tool confirmations. It kills an entire family of \"tool succeeded, response is empty\" bugs that used to surface as a silent Anton, which is the worst kind of failure mode in a chat interface. Then prime directives: a small set of immutable rules the agent loop enforces above any individual prompt. The first version is verbose; I rewrite each one to a single line. Directives sit at the top of the prompt-precedence stack: directives, agent prompt, prompt template, user message. The non-negotiable rules live in code; everything else is editable.",[72,382,384],{"id":383},"prompts-as-data","Prompts as data",[10,386,387],{},"Late in the week, the move that ties the rest together. All agent prompts go into the database. No hardcoded fallbacks. Editable from the UI, versioned, one row per agent. Anton's behaviour stops being something I deploy and starts being something I configure. Want a different tone for the calendar agent? Edit the row. Want to test a new prompt for research? Save a version, run the suite, keep it or roll back. The principle is the same one as memory: keep behaviour as data, not code, and you keep the option to change your mind cheaply.",[10,389,390],{},"By Sunday night Anton is a different shape than he was on Monday. Inference is on vLLM. Models route through LiteLLM under codenames I picked in an evening. Memory has retrieval that respects domain and intent. Schedules are a real domain. Skills are a typed contract running in their own service with hot reload and metrics. Traces are queryable rows. Permissions are enforced at one chokepoint. Prompts live in the database. The parent has ten delegates instead of sixty-three tools. The week's lesson, the one I'm taking forward: the work that pays the most is the work that turns implicit conventions into explicit contracts. Once a thing has a shape on disk and an entry point in code, you can change anything around it without fear. Plumbing is invisible until it isn't, and this week was almost entirely plumbing.",{"title":27,"searchDepth":28,"depth":28,"links":392},[393,394,395,396,397],{"id":321,"depth":28,"text":322},{"id":328,"depth":28,"text":329},{"id":338,"depth":28,"text":339},{"id":356,"depth":28,"text":357},{"id":383,"depth":28,"text":384},"2026-03-16",{},"\u002Fwriting\u002F2026-03-16-anton-04-plumbing-matures",{"title":313,"description":318},"writing\u002F2026-03-16-anton-04-plumbing-matures","A week of turning implicit conventions into explicit contracts: vLLM, LiteLLM, memory, traces, prompts as data.",[141,41,44,46],"8WcF6rDf6JZIZxtfBjGvTjohwAVtmLei6lhFHb6PkAI",{"id":407,"title":408,"body":409,"canonical_url":30,"date":531,"description":413,"extension":32,"meta":532,"navigation":33,"path":533,"seo":534,"series":137,"stem":535,"summary":536,"tags":537,"work_slug":141,"__hash__":539},"writing\u002Fwriting\u002F2026-03-23-anton-05-langgraph-excised.md","Anton, chapter 5: LangGraph excised, agents standardized",{"type":7,"value":410,"toc":525},[411,414,421,424,428,431,435,445,456,463,502,506,509,512,516,519,522],[10,412,413],{},"Monday morning I open the editor and the shape of the week is already clear in my head. Every agent in Anton is a LangGraph subgraph. The parent is a LangGraph state machine. Conversation history is a LangChain message array. The trace viewer parses LangGraph checkpoints. Skills are wrapped as LangChain tools. The framework is not on the side of the system. It is the system.",[10,415,416,417,420],{},"Two weeks ago I picked LangGraph and I thought it was a great choice. It was. Structure, observability, checkpointing, a clean way to express subgraphs per domain. It got me to a working assistant fast. What I notice now is that none of those things are pulling their weight anymore. Every time I add a delegate I am editing a ",[14,418,419],{},"StateGraph"," builder. The runtime keeps imposing a node-and-edge mental model on what is, conceptually, just an LLM looping over tools until it is done. LangChain's APIs evolve and break things on unrelated weeks. And there are now three separate representations of the same idea in the codebase: the LangGraph graph, the domain module registry, and the UI's architecture view. They drift. I reconcile them by hand.",[10,422,423],{},"The friction has crossed the value. That is the moment.",[72,425,427],{"id":426},"the-decision","The decision",[10,429,430],{},"The decision lands in one commit and it is austere: two primitives only. Skills and agents. No pseudo-agents. No special endpoints. No classify-then-dispatch pipelines registered as agents. If it does not need reasoning, it is a skill. If it does, it is an agent that loops on a real LLM. Anything that does not fit one of those two shapes does not get to exist.",[72,432,434],{"id":433},"excision","Excision",[10,436,437,438,440,441,444],{},"The next commit is the brutal one. LangGraph comes out. In its place I write ",[14,439,370],{}," in a new ",[14,442,443],{},"packages\u002Fagent\u002F",", a few hundred lines that do exactly what is actually needed: an LLM call, tool dispatch, a loop bound, trace emission, a permission filter, a validation pass. That is the whole runtime. Reading it back I am almost embarrassed at how small it is. Two weeks of framework, replaced by a function I can hold in my head.",[10,446,447,448,451,452,455],{},"Then the rename. ",[14,449,450],{},"graph"," becomes ",[14,453,454],{},"agent"," everywhere: package names, file names, doc copy, UI labels. The \"no graph terminology\" rule goes into MEMORY.md. Documents that still say \"subgraph\" are now misleading rather than out-of-date, which is a stronger reason to fix them. A sweep through docs consolidates the lot and resolves the inconsistencies the rename leaves behind.",[10,457,458,459,462],{},"What falls out of this is what I was actually after. Every agent now has the same one-line shape: a thin function that hands its input to ",[14,460,461],{},"runAgent"," with a config. Every delegate handler is one line: call the agent, return its text. The Invoke tab, the schedules system, anything that wants to talk to an agent, all see the same surface. Uniformity from the outside is what makes everything else easy from the inside.",[10,464,465,466,469,470,473,474,473,477,473,480,483,484,487,488,490,491,494,495,66,498,501],{},"The same week, skill naming gets standardized to ",[14,467,468],{},"verb_entity",": ",[14,471,472],{},"get_event",", ",[14,475,476],{},"create_event",[14,478,479],{},"update_event",[14,481,482],{},"list_events",". Aliases that grew over the past two weeks get folded back into the canonical name (",[14,485,486],{},"update_event_by_title"," becomes a path inside ",[14,489,479],{},"). The ",[14,492,493],{},"web"," domain dissolves: it was duplicating ",[14,496,497],{},"documents",[14,499,500],{},"research",", and once I look at it without the LangGraph frame there is no reason for it to be its own thing. Naming consistency is a small win on its own. Combined with the new runtime it means the LLM sees one coherent tool surface and the system prompt can describe what an agent does in a paragraph instead of enumerating thirty idiosyncratic commands.",[72,503,505],{"id":504},"replication","Replication",[10,507,508],{},"The 25th, with the runtime quiet and the renames done, I write the replication engine. The framing in the commit message is the Von Neumann probe: clone the entire Anton stack to a new server with one command. The mechanism is unromantic, rsync plus docker compose plus a seed orchestration, but the property it gives me is the one I want. Three reasons it matters now: every household should be able to run its own Anton, the Spark could die and I want a clone to come up cleanly, and I want to be able to spin up a copy to test invasive changes without holding my breath. The replication script does the first cut of all three.",[10,510,511],{},"It also surfaces the secret-management problem in a way I cannot ignore anymore. Vaultwarden auth does not survive cloning cleanly. A clone comes up missing the credentials it needs to be useful, and the only way to fix it is by hand on each machine. That defeats the point. I leave it open for now. It is the next problem.",[72,513,515],{"id":514},"a-second-transport","A second transport",[10,517,518],{},"The Telegram bridge lands the same week. Same agent backend, same skill surface, different transport. The fact that I can add a whole new way for users to talk to Anton without touching the agent loop is the validation that splitting transport from worker on day one was the right call. The new bridge is a small app that enqueues jobs the same way WhatsApp does. The agents do not know which one they are answering.",[10,520,521],{},"Two refinements to the retrieval layer round out the week. Calendar queries now weight calendar facts more heavily, media queries weight media facts: the provenance tags I added the previous week finally do something useful. And document-derived facts stop leaking across users. One person's PDFs cannot show up in another person's recall. The household has more than one human in it; the memory has to know that.",[10,523,524],{},"By Thursday night the codebase looks like what I wanted it to look like two weeks ago and could not have known to ask for. Two primitives. One runtime. One naming convention. A replication path. A second transport. The friction that was building all of last week is gone, and what is left is small enough to keep entirely in my head. Which is the only size I trust.",{"title":27,"searchDepth":28,"depth":28,"links":526},[527,528,529,530],{"id":426,"depth":28,"text":427},{"id":433,"depth":28,"text":434},{"id":504,"depth":28,"text":505},{"id":514,"depth":28,"text":515},"2026-03-23",{},"\u002Fwriting\u002F2026-03-23-anton-05-langgraph-excised",{"title":408,"description":413},"writing\u002F2026-03-23-anton-05-langgraph-excised","Replacing a framework with a few hundred lines of runtime, and reshaping the system around two primitives.",[141,41,44,538],"architecture","qep6uWMXExCBymYG3V6fsubXSZSoznjDioKbLYlPgAw",{"id":541,"title":542,"body":543,"canonical_url":30,"date":672,"description":547,"extension":32,"meta":673,"navigation":33,"path":674,"seo":675,"series":137,"stem":676,"summary":677,"tags":678,"work_slug":141,"__hash__":679},"writing\u002Fwriting\u002F2026-03-27-anton-06-mesh-sandbox.md","Anton, chapter 6: The mesh, the sandbox, and self-reflection",{"type":7,"value":544,"toc":665},[545,548,552,559,563,576,587,602,606,609,616,620,635,638,642,659,662],[10,546,547],{},"Four days, around eighty commits, the densest stretch of the project. By the end of it most of what I'd call \"the current architecture\" has been decided. I'm going to skip the small commits and write down the five things that mattered.",[72,549,551],{"id":550},"the-mesh","The mesh",[10,553,554,555,558],{},"The first is the mesh. I want Anton instances to find each other. Not share a database, not share skills, not share secrets, just find each other and forward calls. I call it SCUT, Symmetric Cluster Universal Transport, because every node is the same shape and the relationship between two nodes is what scopes access. Probes for discovery, heartbeat for liveness, an invocation forwarder on top. The contract is simple: the instance is the identity, and the relation between instances is what you can ask for. This means a clone running for a different household can ask my Anton to run a media query without ever seeing the wine collection, the family vault, or the Plex credentials. Federation as relationships, not as shared infrastructure. The whole thing dedupes into a single ",[14,556,557],{},"@anton\u002Fmesh"," package once the protocol settles.",[72,560,562],{"id":561},"the-sandbox","The sandbox",[10,564,565,566,473,569,473,572,575],{},"The second thing, and the one that takes the most work, is the sandbox. The Node skill-runner I built earlier has no runtime isolation. A skill can read any env var, exec anything, hit any URL. For a personal server this was tolerable. For a mesh of instances forwarding invocations to each other, it's not. So I rewrite the runner on Deno. Each skill runs in a Deno Worker with the minimum permissions it needs: ",[14,567,568],{},"--allow-env=K1,K2",[14,570,571],{},"--allow-net=specific.host",[14,573,574],{},"--allow-read=\u002Fspecific\u002Fdir",". Nothing more.",[10,577,578,579,582,583,586],{},"This rewrite spans roughly 25 commits over two days, and the reason it spans 25 commits is that Deno's strict execution model exposes every implicit assumption Node was letting me get away with. Bare specifier mappings, sloppy imports, npmrc handling, deno.json paths, Dockerfile fixes, transitive import map entries every existing skill quietly relied on. Then the env model: ",[14,580,581],{},"Deno.env.get\u002Fhas\u002FtoObject"," become permission-scoped, so I have to walk every skill, audit its ",[14,584,585],{},"secretKeys",", and turn previously-silent missing-key behavior into explicit errors. Each commit unblocks one more skill that didn't previously care about runtime isolation. By the end I have a two-boundary security model written down. Agent boundary, ReBAC, who can invoke which agent. Skill boundary, Deno permissions, what this code can do. Two boundaries, two questions, neither one swallows the other.",[10,588,589,590,593,594,597,598,601],{},"While I'm there I rip Vaultwarden out and replace it with an encrypted ",[14,591,592],{},"secrets"," table in Postgres. One thing to back up. Survives cloning. Decrypted only at the call site, listable and editable from the LCARS UI. The other half of secret hygiene is a thing I almost get wrong: secrets have to reach the Worker via ",[14,595,596],{},"postMessage",", never through the parent process env. If the parent's env is populated, a Worker that asked for ",[14,599,600],{},"--allow-env=null"," could still exfiltrate it. The Deno permission model is only as honest as the boundary you actually defend.",[72,603,605],{"id":604},"the-browser-agent","The browser agent",[10,607,608],{},"The third beat is the browser. Doctolib, the syndic site, the consulate appointment monitor: each of them is currently a hardcoded Playwright script, copy-pasted intent and brittle selectors. I replace the pattern with one generic browser agent. Navigate, click, type, screenshot, evaluate. The LLM drives. One agent, many sites. The three domains migrate in three commits, three hardcoded scripts deleted in the same afternoon. The principle that comes out: explore agentic, build deterministic. The LLM is fantastic at finding the right button on a page it's never seen. It's overkill, and expensive, for the same flow you run twice a day. Use it to scout, then write down what it found.",[10,610,611,612,615],{},"The browser work is also where the ",[14,613,614],{},"request_input"," tool finally lands cleanly. The Doctolib 2FA pattern from the first weekend has been evolving for weeks: ask the user mid-flow for a code, suspend, resume. As a generalist primitive it belongs to the browser agent first, but the shape generalizes. Any tool can pause, ask the human something, and continue with the answer. It's a small mechanism. It feels right, the way something does when it solves a class of problem you didn't quite know how to name.",[72,617,619],{"id":618},"the-family-vault","The family vault",[10,621,622,623,626,627,630,631,634],{},"The fourth beat is the family vault. A permission-aware document store on object storage in Frankfurt, with ",[14,624,625],{},"family"," versus ",[14,628,629],{},"personal"," visibility and explicit ",[14,632,633],{},"visibleTo"," overrides per file. The architecture deliberately avoids derived roles: the answer to \"who can see this\" is the document's own metadata, not a graph traversal. Vision-based extraction lands the same day, then batch vision extraction with a personal-visibility default, then LLM-based fact generation from the extracted documents with calendar expiry reminders for the things that expire. A rule I write down from the cost analysis: scout with the LLM, build deterministic extractors, don't brute-force vision on every file. Same lesson as the browser. The LLM is the scout, not the worker.",[10,636,637],{},"The Notion migration runs on the same vault. Family Notion workspace pulled into the new store. Three commits of Deno import friction before I just inline the Notion client to dodge an AWS SDK barrel that doesn't want to play. Characteristic moment of the week: rewriting one bare import is cheaper than letting the LLM brute-force around it.",[72,639,641],{"id":640},"self-reflection","Self-reflection",[10,643,644,645,648,649,648,652,648,655,658],{},"The fifth beat is the one I've been waiting to build for a while. Anton starts critiquing his own performance. A nightly review reads the trace history from the last day, classifies failures by type, and files a GitHub issue per cluster. The issue carries a label that drives a state machine: ",[14,646,647],{},"needs-triage"," to ",[14,650,651],{},"ready-to-fix",[14,653,654],{},"fixed-locally",[14,656,657],{},"deployed",". The review is only possible because of the full execution traces from a few weeks ago. Without the traces, Anton would be reviewing his own outputs. With them, he's reviewing what the LLM was actually thinking at each step, what tools it called, what came back. The reviewer's job becomes possible because the substrate is honest.",[10,660,661],{},"A handful of structural commits land in the same window and are worth a sentence each. The subAgents\u002Fdelegates duality from chapter 5 finally collapses: agents become the single entry point, everything goes through the same delegate registry, the parent stops knowing about graph types or command dispatch and just routes to delegates. The Invoke tab grows a permission filter, so you only see agents the selected user can actually call. Prompt injection gets a real trust model with content markers and a risk audit trail. Directives land as standing instructions for agent behavior, with a note to prune them periodically before they bloat. Mistral Large lands in the LiteLLM router as a third reasoning option. The US consulate appointment monitor becomes a scheduled job: scan six months, observe what comes back, retune to four. Observe first, tune second, the same rule that's been quietly threading through the rest of the week.",[10,663,664],{},"By the end of the four days Anton can call other Antons over an authenticated mesh. Skills run sandboxed with the minimum permissions they need. Secrets live in one encrypted table and never touch a subprocess env. Any website with a form is a browser-agent target. Documents have visibility metadata and the vault knows who can see what. And every night Anton reads his own day, decides what went wrong, and files the work to fix it. The two-boundary model, agent and skill, is the spine that holds the rest of it up.",{"title":27,"searchDepth":28,"depth":28,"links":666},[667,668,669,670,671],{"id":550,"depth":28,"text":551},{"id":561,"depth":28,"text":562},{"id":604,"depth":28,"text":605},{"id":618,"depth":28,"text":619},{"id":640,"depth":28,"text":641},"2026-03-27",{},"\u002Fwriting\u002F2026-03-27-anton-06-mesh-sandbox",{"title":542,"description":547},"writing\u002F2026-03-27-anton-06-mesh-sandbox","Federation, a Deno sandbox, an encrypted secrets table, and a nightly self-review loop.",[141,41,44],"OrWApIMakFxuqBa78Lv6HDqZ9_R1YcqeciNRD6czokw",{"id":681,"title":682,"body":683,"canonical_url":30,"date":753,"description":687,"extension":32,"meta":754,"navigation":33,"path":755,"seo":756,"series":137,"stem":757,"summary":758,"tags":759,"work_slug":141,"__hash__":760},"writing\u002Fwriting\u002F2026-03-31-anton-07-cost-syndic.md","Anton, chapter 7: Cost, fallbacks, syndic, heartbeat",{"type":7,"value":684,"toc":747},[685,688,692,695,698,701,705,708,718,721,725,728,731,734,738,741,744],[10,686,687],{},"Two weeks. The system is real enough now that the questions stop being about whether things work and start being about what they cost, what they leak, and what they do when I am not looking. Three threads run through the period. Cost discipline becomes a first-class concern. The syndic domain lands as the second real proof case. And a heartbeat starts ticking in the background, a survey loop that lets Anton observe himself between user requests.",[72,689,691],{"id":690},"cost-discipline","Cost discipline",[10,693,694],{},"The cost work begins with one consequential commit. Every provider call now carries a per-request token budget, enforced by trimming history before the call rather than letting the provider 400 on us. Every LiteLLM call carries attribution metadata: agent, domain, user, request ID. And a shadow-call mechanism duplicates select calls to a cheaper model, logs the deltas, never affects production. The principle is simple: Anton needs to know what he costs, both to budget and to detect regressions. None of this is glamorous. It is the metabolism, the thing you only think about when something goes wrong, and I want it built before something does.",[10,696,697],{},"Memory consolidation moves from \"every fact write\" to a nightly batch with importance scoring, archival, and a health dashboard. The previous shape was contributing real money to the per-request bill and nobody had asked for it to run that often. A few days later, per-request token usage becomes a metric I can chart. Then OpenAI billing hits a wall mid-day and the system needs to keep working, so an auto-fallback to Sonnet or Gemini lands as a last-mile patch. The lesson keeps repeating: if a provider is your single point of failure, your system is your provider's reliability, not yours.",[10,699,700],{},"One night I clear out every hardcoded prompt fallback in the codebase. All agent prompts live in the database now. If the row is missing, the system fails loudly rather than silently using stale text. Three commits, one cleanup pass. The rule is the rule: one source of truth, fail loud when it is missing. It is the kind of cleanup that pays back every time I want to change an agent's behavior without a deploy. Validation gets a related fix the same week: delegates that report partial completion (because they ran out of their tool round budget) used to be treated as final answers by the parent. Now the parent detects budget exhaustion and re-invokes. A whole class of \"Anton stopped halfway and didn't tell me\" disappears.",[72,702,704],{"id":703},"the-syndic-domain","The syndic domain",[10,706,707],{},"Then the syndic. The work that takes the most lines of code in the period is the condo management domain, and it ends up being the proof case for nearly every architectural rule from the previous chapters. Foundations first: schema, skills, a local email client, a file registry, a document ingestion pipeline. The principle that goes into MEMORY the same week: heavy off-Anton agents do the ingest, Anton runs the lightweight queries. Then doc extraction with Gemma 4 vision OCR over PDFs, classification, cleanup. Then gmail ingestion with attachment download and thread-based organization, scanning email attachments alongside Drive docs.",[10,709,710,711,473,714,717],{},"The interesting part is what happens next. The first cut of doc extraction was vision over every PDF. It is slow and it is expensive. A second pass replaces it: pandoc for ",[14,712,713],{},".docx",[14,715,716],{},"pdftotext"," first for PDFs, vision reserved for the cases where text extraction returns garbage. Ten times faster on 90% of files. The lesson lands as a memory entry: scout with the LLM, build deterministic extractors, do not brute-force vision on every file. Same shape as the calendar saga from chapter 2 and the LangGraph excision from chapter 5, just at a different layer: figure out the cheap path, reserve the expensive path for what actually needs it.",[10,719,720],{},"Then the wiki builder, in the Karpathy LLM Wiki shape: documents become structured wiki sections so Anton can answer condo questions without re-reading every PDF. And then the swap I am most pleased with. SimplySyndic was being driven by the browser agent. A morning of reverse engineering reveals that every screen is just an HTTP call to a stable backend. The browser agent comes out, a direct HTTP sync goes in. No browser, no LLM in the loop. The rule that lands in MEMORY: explore agentic, build deterministic. The browser agent is the scout, not the worker. With the HTTP path in place, bank reconciliation auto-matches 99% of BRED CSV lines to SimplySyndic line items in one pass. Structured extractors for fund calls follow. The point is no longer \"extract text from PDF\" but \"extract structured rows the rest of the system can query.\"",[72,722,724],{"id":723},"scout-then-build","Scout, then build",[10,726,727],{},"The same rule gets stress-tested on April 7 with a four-hour spike on SNCF train departures via the browser agent. It works. Then I do the cost arithmetic and revert. A deterministic HTTP path exists for SNCF, and using the browser agent every morning is roughly 100× more expensive. The revert is itself the lesson: cost discipline beats cleverness. Scout agentic, build deterministic, applied to my own code two days after I wrote it down.",[10,729,730],{},"The self-improvement loop matures the same week. Smoke tests, deploy tracking, regression detection. Before this, the loop could file an issue but could not tell whether a fix had worked. Now deploys are tracked, smoke tests run on a deploy boundary, and regressions surfaced in the trace history get re-filed as issues linked to the deploy that introduced them. A loop that watches itself, with a memory long enough to notice when a fix did not stick.",[10,732,733],{},"Scheduled tasks get tightened. Three commits in two days close the notification bypass paths a scheduled task could use to send messages outside the normal gate. One gate-everything design, no exceptions. A \"scheduled mode\" prompt rule lands the same week: when the agent is running on a schedule rather than answering a user, output is stricter. No follow-up suggestions, no filler, follow the spec.",[72,735,737],{"id":736},"the-heartbeat","The heartbeat",[10,739,740],{},"Then the heartbeat. A survey loop runs in the background, checks operational state, and only notifies when there is something to act on. The rule is explicit and lives in the prompt: the heartbeat is survey-only, not domain-agent-invoking. It looks; it does not do. This is the quiet substrate I want for Anton having his own awareness of the system, separate from any user-initiated request.",[10,742,743],{},"The 14th adds a single outbound messaging gateway with an audit trail. Every outbound message, to WhatsApp, to Telegram, to a notification channel, goes through one path that logs sender, channel, recipient, content, and which agent or scheduled job emitted it. One chokepoint, one log. The same day I fix two small bugs that are themselves the signal of where the system is now: a heartbeat scratchpad serialization bug, and a year-extraction filer bug. Meta-bugs. The loop that watches for problems has its own problems. That is the kind of bug a small system never has.",[10,745,746],{},"What the two weeks teach me is that complexity has crossed a threshold. The system is now big enough that its operational concerns are first-class: cost, attribution, fallbacks, audit, self-observation. Syndic is the proof that the architectural rules from the earlier chapters hold up under the weight of a real second domain. Cost discipline is no longer a nice-to-have. And the heartbeat means Anton is, for the first time, doing something between requests, even if that something is just looking at himself.",{"title":27,"searchDepth":28,"depth":28,"links":748},[749,750,751,752],{"id":690,"depth":28,"text":691},{"id":703,"depth":28,"text":704},{"id":723,"depth":28,"text":724},{"id":736,"depth":28,"text":737},"2026-03-31",{},"\u002Fwriting\u002F2026-03-31-anton-07-cost-syndic",{"title":682,"description":687},"writing\u002F2026-03-31-anton-07-cost-syndic","Two weeks of operational discipline: cost attribution, fallbacks, the syndic domain, and a survey heartbeat.",[141,41,44],"iIvCKQptTK_WX4290bCeEE7ZTDKo2uPuQFOHVV1q7gg",{"id":762,"title":763,"body":764,"canonical_url":30,"date":969,"description":768,"extension":32,"meta":970,"navigation":33,"path":971,"seo":972,"series":137,"stem":973,"summary":974,"tags":975,"work_slug":141,"__hash__":978},"writing\u002Fwriting\u002F2026-04-15-anton-08-nvfp4.md","Anton, chapter 8: Local LLM optimization, NVFP4 Gemma on DGX Spark",{"type":7,"value":765,"toc":963},[766,769,772,776,779,789,812,815,819,827,842,846,849,875,882,889,892,947,950,954,957,960],[10,767,768],{},"The morning starts with everything broken. Anthropic returns 400 with \"credit balance is too low\" on every request, and because the sunny model group has no fallbacks configured, the error propagates straight back through LiteLLM as \"I couldn't reach the language model. Something may be misconfigured.\" The heartbeat stops, scheduled jobs fail silently, every interactive chain dies on the first token. The actual cause is billing (a card issue, fixable in two clicks), but the fact that one provider's billing hiccup takes the whole assistant down is the real bug. The local Gemma is sitting on the same box, idle, ready to serve. It just isn't wired in as a fallback.",[10,770,771],{},"The plan writes itself: chain every paid provider down to local for survivability, then fix whatever's wrong with the local path so the chain actually works, then use the disruption as cover to do the LLM upgrade I've had in the back of my mind for two weeks. Three things, in that order, because the survivability fix has to land before I touch the running container.",[72,773,775],{"id":774},"a-fallback-chain","A fallback chain",[10,777,778],{},"The first commit is the LiteLLM config. Eight model groups, zero fallback entries: a config shape that's been sitting there since the early days when there was no local model worth falling back to. I add explicit chains so every paid group degrades to gustav (the local Gemma), with the heavier groups going through an intermediate before the local stop. LiteLLM mounts the config as a volume, so a restart is needed for it to pick up. Five minutes of work. The latent gap I'd been carrying for months, closed.",[10,780,781,782,66,785,788],{},"First fallback test fails. Anthropic 400, LiteLLM tries gustav, vLLM responds with its own 400: \"auto\" tool choice requires ",[14,783,784],{},"--enable-auto-tool-choice",[14,786,787],{},"--tool-call-parser"," to be set. The container has never been launched with those flags. The local path was never exercised under tool-call traffic, so the missing flags were latent the whole time. This is the small lesson the morning hands me: a fallback that isn't routinely exercised isn't really a fallback. Schema drift hides in the paths nobody runs.",[10,790,791,792,66,795,798,799,473,801,473,804,807,808,811],{},"Fixing it is two flags and a parser name I don't know. Rather than hunt through release notes, I list the tool_parsers directory inside the running container. A ",[14,793,794],{},"gemma4_tool_parser.py",[14,796,797],{},"gemma4_reasoning_parser.py"," are sitting right there. Grep the container, not the docs: faster every time. I add ",[14,800,784],{},[14,802,803],{},"--tool-call-parser gemma4",[14,805,806],{},"--reasoning-parser gemma4",". Tool-call smoke test through LiteLLM returns a proper structured ",[14,809,810],{},"tool_calls"," object. Fallback chain is functional end to end.",[10,813,814],{},"Now the system is at parity with where it was supposed to be all along. This is the moment I want to upgrade. And this is the moment I almost skip the most important thing.",[72,816,818],{"id":817},"baseline-before-change","Baseline before change",[10,820,821,822,826],{},"I'm about to start changing flags when I catch myself: I have no baseline. No number to compare against. If I jump straight to the upgrade and it gets faster, I won't know by how much; if it gets slower, I might not even notice. I run the benchmark first. Fixed prompt, 200 words of Paris history, 512 max tokens, temperature zero, three runs. ",[823,824,825],"strong",{},"23.4 tok\u002Fs, dead steady across runs",". That's the number I'm trying to beat. Benchmark first, change second, every time. The temptation to skip this step is strong specifically because the change feels obvious. That's exactly when discipline matters.",[10,828,829,830,833,834,837,838,841],{},"Then I read before I touch. Three things I confirm via web sources before staging anything. First, runtime FP8 quantization of Gemma 4 MoE is broken upstream; passing ",[14,831,832],{},"--quantization fp8"," would crash the container on the fused MoE layer loader. There's an open issue tracking it. Off the table. Second, an NVFP4-quantized Gemma 4 checkpoint someone had published is up on HuggingFace, 16.5 GB across three shards, ready to pull. Third, the checkpoint requires a patched ",[14,835,836],{},"gemma4.py"," because of another open vLLM issue: the built-in ",[14,839,840],{},"expert_params_mapping"," doesn't handle NVFP4 scale key suffixes. The patch ships alongside the model weights as a sibling file. The upstream fix isn't merged yet, so the bind mount is necessary. There's also a published benchmark on the same hardware showing 52 tok\u002Fs as the achievable ceiling with the right flag set. That's my target.",[72,843,845],{"id":844},"staging-the-upgrade","Staging the upgrade",[10,847,848],{},"Staging happens with the BF16 container still serving. I pull the new vLLM image, snapshot-download the NVFP4 model (about three minutes), copy the patched gemma4.py out of the model directory to a host path I can bind-mount, and rewrite the launch script with the NVFP4 flags and a commented BF16 rollback block sitting right underneath. Stage everything before the disruption window: when the swap actually happens, it's just a container recreate, not a fifteen-minute scramble.",[10,850,851,852,855,856,859,860,863,864,473,867,870,871,874],{},"The flags that matter: ",[14,853,854],{},"--quantization modelopt"," to pick up the NVFP4 weights, ",[14,857,858],{},"--moe-backend marlin"," because the GB10's SM121 lacks native FP4 compute and MARLIN W4A16 is the software-emulated path that actually runs, ",[14,861,862],{},"--max-model-len 131072"," for the full 128K native context, ",[14,865,866],{},"--gpu-memory-utilization 0.85",[14,868,869],{},"--max-num-seqs 16"," sized to actual concurrency rather than a wishful default. The served model name stays ",[14,872,873],{},"google\u002Fgemma-4-26B-A4B-it"," so LiteLLM's config doesn't need a single edit. The bind mount overlays the patched model file onto the path vLLM loads.",[10,876,877,878,881],{},"Container swap takes about ninety seconds end to end with a warm disk cache. The log line I'm watching for arrives: ",[14,879,880],{},"Using 'MARLIN' NvFp4 MoE backend out of potential backends",". MARLIN is selected, the patched loader is in play, the model is up.",[10,883,884,885,888],{},"Same benchmark, three runs: ",[823,886,887],{},"43.5 tok\u002Fs"," (49.0, 37.5, 44.1). 1.86× over baseline. Weight memory drops from roughly 52 GB to 16.5 GB, a 68% reduction. With the freed memory the KV cache budget goes from ~53 GB to ~82 GB at the same 0.85 utilization, which is what lets the max context go from 32K to 128K, a clean 4× without changing anything else. Tool calling still works.",[10,890,891],{},"The variance is higher than BF16 (the runs spread from 37 to 49 tok\u002Fs) because MARLIN is software-emulated FP4 on this hardware, not native compute. The published benchmark target is 52 tok\u002Fs and I'm landing at 43.5; the gap is most likely torch.compile warmup and prefix cache state across cold runs, not the flags. Close enough. The hardware ceiling on this specific path is what it is until the silicon catches up or the backend changes.",[893,894,895,910],"table",{},[896,897,898],"thead",{},[899,900,901,904,907],"tr",{},[902,903],"th",{},[902,905,906],{},"Before",[902,908,909],{},"After",[911,912,913,925,936],"tbody",{},[899,914,915,919,922],{},[916,917,918],"td",{},"Single-request tok\u002Fs",[916,920,921],{},"23.4",[916,923,924],{},"43.5",[899,926,927,930,933],{},[916,928,929],{},"Weight memory",[916,931,932],{},"~52 GB",[916,934,935],{},"~16.5 GB",[899,937,938,941,944],{},[916,939,940],{},"Max context",[916,942,943],{},"32,768",[916,945,946],{},"131,072",[10,948,949],{},"I leave the BF16 rollback block sitting in the launch script, commented. Pasting it into a shell reverts the config in about ninety seconds. The NVFP4 model and the patched file stay on disk; rollback is a container recreate, not a data restore.",[72,951,953],{"id":952},"survivability-is-a-feature","Survivability is a feature",[10,955,956],{},"Sitting with the result, the morning's three lessons are the ones that compound. Fallback paths that aren't routinely exercised aren't really fallbacks: the missing tool-call-parser flag had been latent for months because nothing ever fell back. Baseline before change, every time, especially when the change feels obvious. And when you're about to mess with a running service, stage everything you can while it's still up; ninety seconds of downtime instead of fifteen minutes is the difference between a deploy and an incident.",[10,958,959],{},"There are open lines from here. Speculative decoding is on the table: the smaller Gemma 4 variants share the vocab with the 26B and are valid draft models, with a budget of two to four GB for another 1.5× on single-request latency that would compound with NVFP4. A second small-model container as a router (Qwen3-8B or similar) could move heartbeat and classification traffic off Gemma entirely; the freed GB handles it fine and the latency distribution wins more than further Gemma tuning would. And the upstream patch for the NVFP4 expert mapping is worth checking on periodically; when it lands in an image tag, the bind mount goes away.",[10,961,962],{},"For now the picture is clean. The local Gemma serves at 1.86× the throughput on a third of the weight memory, with a context window that can swallow whole documents instead of choking on them, and a fallback chain that means any provider going down (billing, rate limits, an API blip) routes traffic to the box on my desk instead of taking the assistant offline. The morning started with everything broken. The evening ends with a system that's harder to break than it was before any of this happened.",{"title":27,"searchDepth":28,"depth":28,"links":964},[965,966,967,968],{"id":774,"depth":28,"text":775},{"id":817,"depth":28,"text":818},{"id":844,"depth":28,"text":845},{"id":952,"depth":28,"text":953},"2026-04-15",{},"\u002Fwriting\u002F2026-04-15-anton-08-nvfp4",{"title":763,"description":768},"writing\u002F2026-04-15-anton-08-nvfp4","How a billing failure turned into a local-LLM upgrade and a real fallback chain.",[141,41,44,976,977],"llm","dgx","0rcqXpevlbStnOha7LCdLqbKqhywEN0aIxOO6mUk9qI",{"id":980,"title":981,"body":982,"canonical_url":30,"date":1075,"description":986,"extension":32,"meta":1076,"navigation":33,"path":1077,"seo":1078,"series":137,"stem":1079,"summary":1080,"tags":1081,"work_slug":141,"__hash__":1082},"writing\u002Fwriting\u002F2026-04-19-anton-09-threads.md","Anton, chapter 9: Threads, spawn, and the cast",{"type":7,"value":983,"toc":1068},[984,987,991,1001,1005,1032,1035,1039,1042,1046,1049,1053,1056,1059,1062,1065],[10,985,986],{},"The chapter opens with Clara. She's a non-technical co-owner of the system now, and the memory entry I write for her is small but it changes the shape of the next ten days: respond simply, no jargon, escalate to me when needed. The system prompts pick up simplification rules and a cleaner fallback path. Real second user on real Anton, and every rough edge becomes a real complaint. That's the human reason most of what follows happens.",[72,988,990],{"id":989},"threads","Threads",[10,992,993,994,996,997,1000],{},"The biggest single change of the chapter lands on April 20: threads. Anton learns to do several things at once. Until now a run is a run: one conversation, one in-flight loop, and anything else has to wait. That works for a personal assistant talking to one person at a time. It does not work for a household where media triage is happening in the background, the syndic agent is reconciling invoices, and a school message comes in from a different group all at the same time. The plan is a thread registry primitive backed by Redis, channel and group and parent and thread IDs threaded through every agent context, a ",[14,995,461],{}," that registers itself and drains injections and child events and honors cancel, and an ingress layer that knows whether an incoming user message belongs to an active thread or starts a new run. On top of that sits a ",[14,998,999],{},"spawn_thread"," runtime tool so the agent itself can fan out, a live SSE event stream so the UI can show what each running thread is doing, and the hardening that any concurrency primitive needs to actually be safe: atomic inject-if-running to close the injection-loss race, finalize-drain, cascade cancel so killing a parent kills its children with no orphans, fan-out budget and TTL caps. By the end of the day one Anton can run multiple long-lived threads in parallel and a user can @mention into any of them without restarting anything.",[72,1002,1004],{"id":1003},"skills-cleanup","Skills cleanup",[10,1006,1007,1008,1011,1012,1015,1016,1019,1020,1023,1024,1027,1028,1031],{},"Then a cleanup that's been waiting since chapter 6. Five commits over an afternoon delete ",[14,1009,1010],{},"@anton\u002Fskills"," entirely, the original skills package from chapter 1 that became a barrel after chapter 4's ",[14,1013,1014],{},"defineSkill"," rewrite and dead weight after the move to Deno. The new layout is three rules: skills live at ",[14,1017,1018],{},"skills\u002F\u003Cdomain>\u002F\u003Cname>.skill.ts"," as first-class hot-reloadable units, ",[14,1021,1022],{},"skills\u002F\u003Cdomain>\u002F_lib\u002F"," holds small stable helpers shared inside one domain, and a thin ",[14,1025,1026],{},"skills-shared"," facade exposes the narrow Node-side surface that worker and agent and transports need. The principle underneath is simple: libraries are boring and fixed, skills evolve. If a ",[14,1029,1030],{},"_lib"," helper needs editing to support a feature, that's the signal it wants to be a skill. The same instinct produces the storage decision tree on the same day: a documented rule for choosing between facts (free-form), collections (typed-shape), files (blobs), and the family vault. Last month's organic growth produced overlap between all four, and a decision tree is cheaper than a refactor.",[10,1033,1034],{},"Around the same window, the coder agent gets a three-tier write scope. Tier 1: prompts only. Tier 2: skills plus prompts. Tier 3: any code. The tier is set per invocation and the coder cannot escalate itself. That's what makes the self-improvement loop safe enough to leave running unattended: the loop fixing a prompt regression has no permission to rewrite agent infrastructure to do it.",[72,1036,1038],{"id":1037},"spawn-and-awakening","Spawn and awakening",[10,1040,1041],{},"Spawn and awakening land on the 22nd, and they finish the federation story that started with the replication engine in chapter 5 and the mesh in chapter 6. Spawn is parent-side: it provisions infrastructure for a new clone, copies prompts over, seeds an identity, and registers the clone in the mesh. Awakening is clone-side: the new instance learns what it's for from its operator through a guided onboarding conversation, runs self-diagnostics, and keeps a mentor channel open back to the parent for questions. A clone isn't a docker stack any more. It's an Anton that wakes up, finds out who it is, and joins its peer.",[72,1043,1045],{"id":1044},"the-cast","The cast",[10,1047,1048],{},"The same day flips Gustav (local Gemma) to primary inference for every agent. It's a one-line change because of the work in chapter 4 that made prompts a single source of truth. The savings are real. Quality regressions are caught by the self-improvement loop and resolved through the cast: intent-gated escalation goes in, so the agent only reaches for a strong model when the intent of the request needs it (research, complex reasoning), and the cast formalizes models as characters. Each model is a named specialist with a prompt-defined personality and area of strength. The agent picks who to ask the way a person picks who on their team to email. The LiteLLM codenames (sunny, gizmo, gandalf, gustav, william) have been hinting at this since chapter 4. The cast makes it explicit: ask specialists by name, and William gets smarter.",[72,1050,1052],{"id":1051},"a-usable-heartbeat","A usable heartbeat",[10,1054,1055],{},"The heartbeat from chapter 7 gets the operational pass that makes it usable continuously. Idempotent memory writes plus a topics collection so re-running the same observation doesn't duplicate facts. Thread-aware so it doesn't interrupt active conversations. A loop that remembers what it just notified about and stays quiet rather than re-touching the same topic every tick. The outbound gateway from chapter 7 makes silencing silent bookkeeping events a one-place fix. Hallucinated-notification retry actually sends now instead of just removing the claim, and the simplified-response layer catches LaTeX and other artifacts before they reach Clara. The heartbeat ends the period as a usable proactive layer: observing, deciding when to speak, staying silent the rest of the time.",[10,1057,1058],{},"The SimplySyndic writes finally land too: a Playwright script that closes the loop on the syndic domain. Reading and reconciling has been working since chapter 7. Writing closes the loop. The next step is replacing Playwright with a deterministic HTTP path now that the read side has shown what the API looks like, and that spec is on the list, not done. Vaultwarden gets a documentation purge in the same window, the rule being that docs describe the current state only, no historical mentions of what an earlier version did.",[10,1060,1061],{},"Where Anton is at the end of all this: one identity, ten domain agents, a cast of named specialists, thread-aware concurrency. Every prompt in the database, so flipping the default model is one row edit. Local Gemma 4 NVFP4 as primary inference, cloud as fallback. Encrypted DB secrets and scoped Deno Worker permissions per skill. Mesh, replication, spawn, awakening for clones. A heartbeat that observes operational state and mostly stays quiet. A self-improvement loop with deploy tracking and regression detection. Two transports (WhatsApp, Telegram), both routing through one worker and one outbound gateway. A hundred-plus quality tests, full execution traces in Postgres, an LCARS dashboard for everything.",[10,1063,1064],{},"The plumbing that took six chapters to build is what most people would call boring infrastructure. That's fine. The substrate is built. The interesting behavior happens on top of it now: the cast, the heartbeat, the self-improvement loop, the long-running threads talking to each other and to us.",[10,1066,1067],{},"There is open work on the list. Doctolib syncing to calendar as a scheduled job. Auto-importing WhatsApp group members as users and mapping their JIDs to roles. Permission flows for new users at scale. The SimplySyndic write path migrating from Playwright to deterministic HTTP. Making Gemma 4 the assumed default everywhere it isn't yet. Open questions for the next stretch, not promises.",{"title":27,"searchDepth":28,"depth":28,"links":1069},[1070,1071,1072,1073,1074],{"id":989,"depth":28,"text":990},{"id":1003,"depth":28,"text":1004},{"id":1037,"depth":28,"text":1038},{"id":1044,"depth":28,"text":1045},{"id":1051,"depth":28,"text":1052},"2026-04-19",{},"\u002Fwriting\u002F2026-04-19-anton-09-threads",{"title":981,"description":986},"writing\u002F2026-04-19-anton-09-threads","Concurrency, awakening clones, and a cast of named model specialists working as a team.",[141,41,44],"9NLLnlfyMQjr9t8MFAeFIIOu0wHDf2jS9c3L8X_9eEU",1780849289027]