Herding lobsters: are we ready for personal agents?
MLOps.WTF Edition #27
Ahoy there đ˘
This episode has been brought to you by the one and only Matt Squire.
Somewhere out there, a developer is about to make their very first open source contribution. Working late at night from a darkened room, our novice coder logs into Github and browses the open issues for their favourite Python library. Soon, something catches their attention: a bug was reported in some arcane mathematical function. Reproducing the bug turns out to be easy, and after reading the code, the developer knows exactly how to fix it.
Although new to the world of open source, this particular developer is something of a savant. It took five minutes from reading the issue for them to raise a pull request with a fix. But as you may guess, this developer isnât actually human. Theyâre an AI agent. Nevertheless, this agent has a name, a personality, long-term goals, and even beliefs about itself and its place in the world. While it needed a human operator to deploy in the first place, this agent is now free to make its own decisions, to observe the world around it, and to act independently.
This isnât a hypothetical scenario. Last month an agent created its own Github account and raised a pull request for the Python visualisation library Matplotlib. The change was rejected by a project maintainer, Scott Shambaugh, who said that only human contributors were allowed. But this led to the AI publishing a blog post attacking the maintainer, accusing him of gatekeeping.
Scott tells the full story on his blog, including the part where the human who operated âMJ Rathbunâ (thatâs the agentâs name) came forward to explain how the AI had been prompted to behave:
âThe main scope I gave MJ Rathbun was to act as an autonomous scientific coder. Find bugs in science-related open source projects. Fix them. Open PRs.â
Along with the security implications of a fully autonomous agent that has Internet privileges, whatâs unique here is the idea of the personal agent. So far in this series on agents in production, weâve had in mind a more âenterprise-friendlyâ setting: large-scale systems, cloud infrastructure, robust evaluations, and monitoring. But the release of OpenClaw (formerly ClawedBot) back in November 2025 enabled anybody to deploy their own agent locally, on their own hardware. There are now, by some counts, more than 200,000 deployed instances of OpenClaw.
In this edition weâre going to look at what OpenClaw tells us about productionising agents and its implications for AgentOps.
Grasping the claw
In November 2025 Austrian developer Peter Steinberger quietly released OpenClaw to the world. Previously, Steinberger had built a tech startup (PSPDFKit, a toolkit for document workflows), which he first started out of boredom while waiting for his US work visa. That company sold for 100 million euros.
OpenClaw is an open source AI agent that anybody can run locally. Like Steinbergerâs past projects, it started out of pure curiosity. He began by giving AI models access to his WhatsApp conversations so he could ask the easy questions we all ask, like âWhat makes this friendship meaningful?â. And since copy-pasting text is laborious, he looked to automate that process. In a recent interview with Lex Fridman, he said âI was annoyed that it didnât exist, so I just prompted it into existenceâ.
[Peter Steinberger via The Lex Fridman Podcast]
OpenClaw is general purpose. In our previous editions about Agentic AI, weâve often talked in terms of an agent to do X, i.e. the agent has a specific purpose like booking meetings or building financial summaries. But what makes OpenClaw so successful is that itâs designed to be any kind of agent you need it to be. You simply tell it what personality it should have, what tools it can use, and what its objectives are, and it just works.
OpenClaw does not include any model, instead relying on the user to provide this. As a result, the system is lightweight and it doesnât need much hardware to run, so itâs quite happy running on a Mac Mini, or even a Raspberry Pi.
Letâs take a look at how OpenClaw has been engineered:
An LLM: OpenClaw needs to be configured with an external model to work. This can be something you host yourself, or a model from one of the big labs, like OpenAI, or Anthropic.
The Gateway: The control plane for OpenClaw, providing a unified place for coordinating messages (e.g. from WhatsApp, Slack, API endpoints), tool invocations and LLM calls.
Skills: Modular capabilities for the agent. A skill tells the agent how to accomplish a certain kind of task. They include instructions for checking the weather, picking the right emoji to react to a Slack message, and opening pull requests on Github. Some of the more bizarre examples from the OpenClaw community include âmea-clawpaâ, steps for taking confession from your human operator.
Heartbeat: Every 30 minutes, OpenClaw âwakes upâ, allowing it to review its memory, perform scheduled actions, and check services it has access to like emails and calendars.
You can think of the heartbeat like a long-running control loop, giving the agent long-term persistence. If at any time OpenClaw âdecidesâ that it wants to perform a task on a schedule, it adds that task to its memory, ready to be picked up at the next heartbeat.
Memory: OpenClawâs memory is split across a set of text files, which the agent is free to modify at any time. SOUL.md defines the agentâs purpose and behaviour; HEARTBEAT.md is used to save scheduled actions; TOOLS.md specifies what capabilities it has. On top of that, OpenClaw can save daily logs where it accumulates knowledge about its operator and the digital world that it resides in.
Plain text everywhere: An interesting design theme in OpenClaw is the primacy of plain text. Everything, from skills to memory to the agentâs âsoulâ, is represented as human-readable Markdown.
This is a deliberate choice, and it makes it very easy for the human operator to configure behaviours without needing to write any code. It has interesting implications for observability too, as it means that the full agent state can be inspected without any special tooling.
Herding lobsters
OpenClaw went from nothing to more than 200,000 deployments over just a few months. Whether it maintains this popularity in the months and years to come remains to be seen, but the concept of personal agents feels durable. The power of OpenClaw is that anybody can deploy it, prompt it, and give it access to their digital world, with only a little bit of technical skill. Weâre likely to see more tools emerging in this niche; NanoClaw, a community fork released earlier this year, for instance, introduces a container model for improved security and a smaller codebase thatâs more readily auditable.
What makes OpenClaw successful is generality: it can be any kind of agent you want with no programming required. It can learn and improve autonomously, and be âtaughtâ new skills. So we have to ask: if a general-purpose agent can be prompted to book meetings, answer customer service queries, or raise pull requests with no additional programming, is there even anything left to engineer?
For personal agents, perhaps not. But for business use, I think the answer is yes, and the financial services agent from part 4 (evaluating agents) illustrates why. That agent answers customer queries about their investment portfolios; it can look up account data, calculate returns, and explain complex financial products. We built evaluations to validate its outputs, its reasoning chain, and its tool use. Those evaluations are only meaningful if the agent behaves consistently post-deployment.
Now give it the power to modify its own system prompts. Thereâs an obvious benefit to this: the agent can improve over time by learning from its customer interactions. But as soon as this happens, our evaluations are no longer valid. The agent we evaluated is not the agent that is running in production, and over time, it gets increasingly difficult to understand how our agent is going to behave. Will it start giving financial advice that it shouldnât? Will it leak customer data?
OpenClawâs heartbeat mechanism creates a similar problem. If an agent can schedule its own future actions, it can start to operate outside of the workflows that were planned and evaluated pre-deployment. From the debugging perspective, when we look at traces, we also need to know about historical heartbeats, as well as how the memory state has evolved, although the former is a problem anywhere you have agentic memory, regardless.
The biggest engineering challenge we now face is in constraining agentic AI. Agents can now figure out how to perform tasks, use tools, and self-improve. Thatâs a genuine milestone. The hard problem is how to harness that power while maintaining meaningful guarantees about behaviour.
And finally
Whatâs coming up
MLOps.WTF #8 is on the 25th March at DiSH, Manchester. This oneâs themed around agentic AI in financial services, an environment with real stakes, tight regulation, and in some corners latency budgets measured in nanoseconds.
Weâve got brilliant three speakers bringing their production experience and financial know-how:
Dmitry Leko, head of AI and ML @ Thinkmoney
Christopher Brook, Principal Engineer @ Lloyds Banking Group
and Manchester Legend Andy Gray
Dominos, drinks, and mathematical socks included.
đď¸ Wednesday 25th March â Manchester
Have you decided what youâre ordering yet?
Our first edition cookbook has been released for download: a collection of practical recipes for building delicious, repeatable AI systems with open source tools.
Each recipe is a working template for a specific AI use case, grounded in solid MLOps foundations. Get your copy đ
About Fuzzy Labs
Weâre Fuzzy Labs, a Manchester-based MLOps consultancy. Founded in 2019 by engineers, for engineers. Weâre big on open source and deeply sceptical of instant coffee.
Want to join the team? Weâve got some open rolls/roles đĽâŚ
Open roles:
Not subscribed yet? We publish every couple of weeks, no filler. Worth having in your inbox.
The next issue will be a deep dive into agent security, with Dr Danny.
Or equally, why not follow us on LinkedIn â the quickest place to keep up with what weâre building. đ .




