Herding lobsters: are we ready for personal agents?

MLOps.WTF Edition #27

Mar 13, 2026

Ahoy there 🚢

This episode has been brought to you by the one and only Matt Squire.

Somewhere out there, a developer is about to make their very first open source contribution. Working late at night from a darkened room, our novice coder logs into Github and browses the open issues for their favourite Python library. Soon, something catches their attention: a bug was reported in some arcane mathematical function. Reproducing the bug turns out to be easy, and after reading the code, the developer knows exactly how to fix it.

Although new to the world of open source, this particular developer is something of a savant. It took five minutes from reading the issue for them to raise a pull request with a fix. But as you may guess, this developer isn’t actually human. They’re an AI agent. Nevertheless, this agent has a name, a personality, long-term goals, and even beliefs about itself and its place in the world. While it needed a human operator to deploy in the first place, this agent is now free to make its own decisions, to observe the world around it, and to act independently.

This isn’t a hypothetical scenario. Last month an agent created its own Github account and raised a pull request for the Python visualisation library Matplotlib. The change was rejected by a project maintainer, Scott Shambaugh, who said that only human contributors were allowed. But this led to the AI publishing a blog post attacking the maintainer, accusing him of gatekeeping.

Scott tells the full story on his blog, including the part where the human who operated ‘MJ Rathbun’ (that’s the agent’s name) came forward to explain how the AI had been prompted to behave:

“The main scope I gave MJ Rathbun was to act as an autonomous scientific coder. Find bugs in science-related open source projects. Fix them. Open PRs.”

Along with the security implications of a fully autonomous agent that has Internet privileges, what’s unique here is the idea of the personal agent. So far in this series on agents in production, we’ve had in mind a more ‘enterprise-friendly’ setting: large-scale systems, cloud infrastructure, robust evaluations, and monitoring. But the release of OpenClaw (formerly ClawedBot) back in November 2025 enabled anybody to deploy their own agent locally, on their own hardware. There are now, by some counts, more than 200,000 deployed instances of OpenClaw.

In this edition we’re going to look at what OpenClaw tells us about productionising agents and its implications for AgentOps.

Grasping the claw

In November 2025 Austrian developer Peter Steinberger quietly released OpenClaw to the world. Previously, Steinberger had built a tech startup (PSPDFKit, a toolkit for document workflows), which he first started out of boredom while waiting for his US work visa. That company sold for 100 million euros.

OpenClaw is an open source AI agent that anybody can run locally. Like Steinberger’s past projects, it started out of pure curiosity. He began by giving AI models access to his WhatsApp conversations so he could ask the easy questions we all ask, like “What makes this friendship meaningful?”. And since copy-pasting text is laborious, he looked to automate that process. In a recent interview with Lex Fridman, he said “I was annoyed that it didn’t exist, so I just prompted it into existence”.

[Peter Steinberger via The Lex Fridman Podcast]

OpenClaw is general purpose. In our previous editions about Agentic AI, we’ve often talked in terms of an agent to do X, i.e. the agent has a specific purpose like booking meetings or building financial summaries. But what makes OpenClaw so successful is that it’s designed to be any kind of agent you need it to be. You simply tell it what personality it should have, what tools it can use, and what its objectives are, and it just works.

OpenClaw does not include any model, instead relying on the user to provide this. As a result, the system is lightweight and it doesn’t need much hardware to run, so it’s quite happy running on a Mac Mini, or even a Raspberry Pi.

Let’s take a look at how OpenClaw has been engineered:

An LLM: OpenClaw needs to be configured with an external model to work. This can be something you host yourself, or a model from one of the big labs, like OpenAI, or Anthropic.

The Gateway: The control plane for OpenClaw, providing a unified place for coordinating messages (e.g. from WhatsApp, Slack, API endpoints), tool invocations and LLM calls.

Skills: Modular capabilities for the agent. A skill tells the agent how to accomplish a certain kind of task. They include instructions for checking the weather, picking the right emoji to react to a Slack message, and opening pull requests on Github. Some of the more bizarre examples from the OpenClaw community include “mea-clawpa”, steps for taking confession from your human operator.

Heartbeat: Every 30 minutes, OpenClaw ‘wakes up’, allowing it to review its memory, perform scheduled actions, and check services it has access to like emails and calendars.

You can think of the heartbeat like a long-running control loop, giving the agent long-term persistence. If at any time OpenClaw ‘decides’ that it wants to perform a task on a schedule, it adds that task to its memory, ready to be picked up at the next heartbeat.

Memory: OpenClaw’s memory is split across a set of text files, which the agent is free to modify at any time. SOUL.md defines the agent’s purpose and behaviour; HEARTBEAT.md is used to save scheduled actions; TOOLS.md specifies what capabilities it has. On top of that, OpenClaw can save daily logs where it accumulates knowledge about its operator and the digital world that it resides in.

Plain text everywhere: An interesting design theme in OpenClaw is the primacy of plain text. Everything, from skills to memory to the agent’s ‘soul’, is represented as human-readable Markdown.

This is a deliberate choice, and it makes it very easy for the human operator to configure behaviours without needing to write any code. It has interesting implications for observability too, as it means that the full agent state can be inspected without any special tooling.

Herding lobsters

OpenClaw went from nothing to more than 200,000 deployments over just a few months. Whether it maintains this popularity in the months and years to come remains to be seen, but the concept of personal agents feels durable. The power of OpenClaw is that anybody can deploy it, prompt it, and give it access to their digital world, with only a little bit of technical skill. We’re likely to see more tools emerging in this niche; NanoClaw, a community fork released earlier this year, for instance, introduces a container model for improved security and a smaller codebase that’s more readily auditable.

What makes OpenClaw successful is generality: it can be any kind of agent you want with no programming required. It can learn and improve autonomously, and be ‘taught’ new skills. So we have to ask: if a general-purpose agent can be prompted to book meetings, answer customer service queries, or raise pull requests with no additional programming, is there even anything left to engineer?

For personal agents, perhaps not. But for business use, I think the answer is yes, and the financial services agent from part 4 (evaluating agents) illustrates why. That agent answers customer queries about their investment portfolios; it can look up account data, calculate returns, and explain complex financial products. We built evaluations to validate its outputs, its reasoning chain, and its tool use. Those evaluations are only meaningful if the agent behaves consistently post-deployment.

Now give it the power to modify its own system prompts. There’s an obvious benefit to this: the agent can improve over time by learning from its customer interactions. But as soon as this happens, our evaluations are no longer valid. The agent we evaluated is not the agent that is running in production, and over time, it gets increasingly difficult to understand how our agent is going to behave. Will it start giving financial advice that it shouldn’t? Will it leak customer data?

OpenClaw’s heartbeat mechanism creates a similar problem. If an agent can schedule its own future actions, it can start to operate outside of the workflows that were planned and evaluated pre-deployment. From the debugging perspective, when we look at traces, we also need to know about historical heartbeats, as well as how the memory state has evolved, although the former is a problem anywhere you have agentic memory, regardless.

The biggest engineering challenge we now face is in constraining agentic AI. Agents can now figure out how to perform tasks, use tools, and self-improve. That’s a genuine milestone. The hard problem is how to harness that power while maintaining meaningful guarantees about behaviour.

And finally

What’s coming up

MLOps.WTF #8 is on the 25th March at DiSH, Manchester. This one’s themed around agentic AI in financial services, an environment with real stakes, tight regulation, and in some corners latency budgets measured in nanoseconds.

We’ve got brilliant three speakers bringing their production experience and financial know-how:

Dmitry Leko, head of AI and ML @ Thinkmoney
Christopher Brook, Principal Engineer @ Lloyds Banking Group
and Manchester Legend Andy Gray

Dominos, drinks, and mathematical socks included.

🗓️ Wednesday 25th March — Manchester

Get my ticket

Have you decided what you’re ordering yet?

Our first edition cookbook has been released for download: a collection of practical recipes for building delicious, repeatable AI systems with open source tools.

Each recipe is a working template for a specific AI use case, grounded in solid MLOps foundations. Get your copy 👇

Get my cookbook

About Fuzzy Labs

We’re Fuzzy Labs, a Manchester-based MLOps consultancy. Founded in 2019 by engineers, for engineers. We’re big on open source and deeply sceptical of instant coffee.

Want to join the team? We’ve got some open rolls/roles 🥖…

Open roles:

See all vacancies

Not subscribed yet? We publish every couple of weeks, no filler. Worth having in your inbox.

The next issue will be a deep dive into agent security, with Dr Danny.

Or equally, why not follow us on LinkedIn — the quickest place to keep up with what we’re building. 🍅.

MLOps.WTF by Fuzzy Labs

Discussion about this post

Ready for more?