For the last few months I've been building a thing called Shrimp. It's an open source agent harness. Basically the layer that sits between a model and the real world. I kept writing the same glue code around different models and eventually just pulled it out into its own project.
I wasn't planning to write about it. But three really good posts came out recently: Alex Ker's "Harnesses are everything," LangChain's "Can Someone Please Define a Harness?", and an anatomy piece that formalized twelve components of a production harness. I read all of them and agreed with most of it. But building Shrimp taught me two things none of them really covered, and I wanted to get them down before I forgot.
Fair warning, this is rough. I'll come back and clean it up. For now it's just notes.
1. a while loop that emits events is a different animal
Every harness post describes the agent loop the same way. while (have next message) do { tool }. That's not wrong. Shrimp has a while loop too. It's right there in src/core/loop.ts, line 75. I didn't replace it. I wrapped it.
The loop is an async generator that yields events. Every iteration emits agent:thinking. Every tool call emits agent:tool-call and agent:tool-result. Every response emits agent:response. There's a central ShrimpEventBus that anything in the system can subscribe to.
I thought this was going to be a cosmetic change. It was not.
When your loop is just a function, everything that wants to know what the agent is doing has to be wired into the loop itself. Logging. The dashboard. The approval gate. Cost tracking. Session persistence. My first version of loop.ts was about 400 lines and most of it was bookkeeping. Only a small slice was actual agent logic. I remember staring at it at some point and going, okay this isn't going to survive another feature.
Once the loop started emitting events, it stopped having to know who was listening. The dashboard subscribes to agent:tool-call and renders a live feed. The session store subscribes and writes to SQLite. The cost tracker subscribes and counts tokens. A feature I haven't even built yet can subscribe without me touching loop.ts.
One thing I had to learn the hard way: if any subscriber throws, the whole bus can take down the agent loop. An observable loop is only observable if its observers can fail in isolation. Shrimp wraps every handler call in a try/catch and catches rejected async handlers too, so a buggy dashboard subscriber can't crash the agent. This is small and mechanical but I think it's load-bearing for the whole idea.
The thing I didn't expect was how it changed how I felt about the loop. When the loop is observable, I stop being scared of it. I can drop in a new capability and not worry I'm going to break some invisible thing. The loop stayed small because the pressure to cram stuff into it just sort of went away.
2. the harness should notice itself
This is the part I'm least sure about but most excited about.
Every post I read talks about memory as something you put into the agent. CLAUDE.md files. AGENTS.md files. Facts the model stores about the user. That stuff is useful, and I built that too, Shrimp uses supermemory for long-term facts and has a working memory store for the session.
But there's a different kind of memory that nobody was really writing about. Memory the agent writes about itself.
Here's the code I keep thinking about. It lives at the bottom of the agent loop, runs after every task completes, and is maybe thirty lines long.
// src/core/loop.ts
private maybeLearnProcedure(userText: string): void {
const recentAssistant = this.conversationHistory
.filter(m => m.role === 'assistant' && m.toolCalls && m.toolCalls.length > 0);
if (recentAssistant.length === 0) return;
const lastTurn = recentAssistant[recentAssistant.length - 1];
const toolNames = lastTurn.toolCalls?.map(tc => tc.name) ?? [];
if (toolNames.length < 3) return;
this.bus.emit('memory:fact-updated', {
key: `procedure:${toolNames.join('→')}`,
newValue: `When user says "${userText.slice(0, 50)}", call: ${toolNames.join(' → ')}`,
});
}
When the agent finishes a task that required three or more tool calls, it writes down what it did. Trigger text on one side, tool chain on the other. Next time something similar comes in, it can check its own notes first and skip some of the flailing.
The matching is still embarrassing. It's a SQL LIKE %query% on the first 50 chars of user text. No embeddings, no clustering, no smart retrieval. If you squint it looks like substring matching with extra steps.
But watching it work is a little weird. I asked my Shrimp instance to do something I'd asked it two days earlier and saw it pull up its own notes on how it did it last time. Not a cached answer. A cached method. "Last time you asked me this, I ran these three tools in this order."
That is a different thing from context memory. The model isn't recalling a fact I told it. The harness is recalling a pattern the agent generated itself.
The anatomy blog lists twelve components. Filesystem, tools, memory, context management, and so on. Memory is in there but it's framed as a place to store things you already know. What I'm describing is memory the system writes unprompted, from what worked. I don't really want to call it a thirteenth component because that sounds too tidy. It might just be a property of memory that nobody has pointed at yet.
One failure mode I hit early: procedural memory is a one-way ratchet unless you give the agent a way to un-learn. A procedure that misleads the agent will otherwise keep climbing the ranking on usedCount alone. So there's a memory.procedures.forget tool now that demerits a procedure, and past a threshold it drops out of recall. Same thing for tool renames: if a saved procedure references a tool the registry doesn't recognize anymore, recall just skips it. Both of those feel small but I think they're the difference between procedural memory getting better over time and procedural memory quietly poisoning itself.
I don't want to oversell this. The matching is still substring matching. Most of the procedures it saves are probably useless and I don't have great retrieval yet. But the direction feels right to me. The model isn't going to keep getting meaningfully smarter forever, and when it plateaus, the interesting question stops being "how smart is the brain" and starts being "how much can the body teach itself."
3. what this does to harness thickness
The anatomy post ends on a nice question. How much logic should live in the harness vs in the model? Anthropic's answer is basically "as little as possible, trust the model." Graph-based frameworks put a lot in the harness explicitly. The post calls this "harness thickness" and treats it as an up-front architectural choice.
There's a third shape that I don't see discussed much.
If the harness emits events and writes its own procedural memory, it can get thinner over time without you doing anything. The loop stays small because new features subscribe instead of wiring in. The agent gets faster at familiar things because the harness remembered, not because you added more logic. You're not picking thick or thin up front. You build a small observable core and let the system accrete.
I'll be honest, most of what I just described is held together with code from this week. But it feels like the shape I want to keep pushing on.
what I'd tell myself at the start
The model is the easy part. I spent weeks tweaking prompts before I realized most of my pain was coming from the harness being a tangle.
An event bus early is worth more than you'd think. It doesn't just clean up the code, it changes what you're willing to build next. And while you're at it, make sure one bad subscriber can't take the whole bus down.
Let the harness learn from its own behavior even when the learning is dumb. Dumb procedural memory is still better than none, as long as you also give the system a way to forget the procedures that lied to it.
And don't trust posts that sound too clean, including this one. The reality of building this stuff is a lot messier than any blog post makes it look. These are notes from someone still in the middle of it.
Thanks for reading. More once I break something.
Anurag