Edge AI: shipping models into the real world

MLOps.WTF Edition #24

Jan 29, 2026

Our MLOps.WTF meetup #7, where Arm’s new office proved (1) extremely nice, (2) “labyrinthine”, and (3) capable of producing the most professional safety briefing we’ve ever had.

There’s a point in most ML systems where the cloud stops feeling abstract and starts feeling expensive. Not just in money, but in time. In latency. In the number of things that have to go right before someone can act on an insight.

The MLOps landscape was built around cloud deployment. Elastic compute. Predictable networking. Logs you can reach. Rollbacks that are mostly just a button. In that world, you can afford to park certain questions for later:

How big will my models get
How fast do I need inference to run
What’s my pipeline for getting data back for future training runs?

The edge removes your ability to postpone them.

When the model sits next to the camera, the sensor, the machine, the crop, the doorbell, reality really is knocking at the door.

So the question hanging over MLOps.WTF #7 was simple enough: when does it stop making sense to send everything to the cloud?

Raj (Fotenix):From Cloud to Edge: The Next Phase of Scalable Crop Intelligence

Raj opened by grounding the room in the realities of food production:

In the UK, most fruit is imported, and a large proportion of vegetables too. Fresh produce, especially things like leafy greens, degrades as it moves through long supply chains. Nutritional value drops. Time and temperature matter. Many greenhouses are old. Labour is short. Margins are thin. On a £2.20 bag of apples, growers make about three pence. When intervention is late, it’s lost yield. By the time you’ve “fixed it later”, the opportunity has usually gone.

That’s the environment Fotenix operates in.

They build camera systems for greenhouses that help growers spot plant stress and decide when to intervene. You deploy them quickly and start getting useful signals fast. Today, the setup is cloud-first: images go up, pipelines run, metrics come out.

The problem shows up at scale. Each site generates huge volumes of data. In rural environments, with limited bandwidth, that turns insight into something that arrives too late to act on. As Raj put it, the issue isn’t compute. It’s bandwidth, and more specifically the assumption that every byte needs to travel before it becomes useful.

So they’ve become selective when looking at what to move to the edge and what to keep in the cloud. Some things earn their place close to the data: basic quality checks, pulling out the parts of an image that matter, turning pictures into signals. Other things don’t: training, cross-site analysis, anything that needs global context or doesn’t benefit from immediacy.

Hardware isn’t the limiting factor. The sites already run capable devices. The harder part is operating them. Once you have hundreds of devices in the field, reliability, observability, and fleet management become the real constraints.

Before moving more computation outward, Fotenix is putting effort into the fundamentals. Lightweight runtimes. Local observability. Remote fleet management. The goal is to make sure that when computation does move closer to the data, the system doesn’t become blind or fragile.

Summed up simply: edge is a consequence of maturity. If edge feels risky, that usually means the system isn’t ready yet.

Watch Full Talk

Sam (Fuzzy Labs): Shipping ML to the Edge: A Practical Guide

Sam followed by zooming in on what all of this looks like from an MLOps point of view.

He opened by admitting that speaking into a microphone still makes him feel like he’s on X Factor. Thankfully, instead of “Edge of Glory”, he walked through where edge changes the MLOps lifecycle.

To keep things concrete, he used the tongue-in-cheek”Don’t Ring” doorbell as case study, an anonymised version of a real Fuzzy Labs project. The problem: they had a facial recognition system that worked well, except it failed when people weren’t looking at the camera, or stood too close, or too far away. The solution: build a separate model that detects unusable images and filters them out before they hit the facial recognition system. The model is small and efficient, so it runs it on the device itself without killing the battery.

Models need to be small. Power matters. Privacy often means you don’t get access to real data.

In this case, there was no data at all. Open datasets can get you moving. Synthetic data helps too - Gemini generates realistic training images for about two pence each. But Sam was clear about the catch: synthetic data helps you train. It doesn’t tell you whether the model will behave in the real world. If your evaluation data doesn’t match what the device actually sees, you’ll find out later.

Experiment tracking came up next. He put up a slide most people saw themselves in immediately:

“We’ve probably all been there at some point with something like my_model_V3_best_USE_THIS_ONE_final.”

Deploy the wrong model to the cloud and you redeploy. Deploy it to a device you can’t easily reach and you are running around collecting devices or sending apology emails to customers.. Versioning, lineage, knowing what’s running where. Things you can sometimes get away with being loose about in the cloud start to matter much sooner.

Model optimisation followed the same pattern. Every size reduction is a trade-off. You only really understand those trade-offs if you measure performance in the conditions the model will actually run in, not just against training metrics.

Deployment doesn’t get easier either. Different devices, different architectures, different toolchains. Sam mentioned tools like PlatformIO as a way to avoid rebuilding everything from scratch each time.

The problems Sam walked through aren’t new. Data quality, experiment tracking, model optimisation, deployment complexity - they all exist in cloud systems too. What edge removes is the buffer that makes cloud mistakes recoverable. Deploy the wrong model to cloud and you redeploy in minutes. Deploy it to a device you can’t reach and you need physical access or a complex remote rollback. You don’t need entirely new skills for edge ML. You need to be much more careful about the fundamentals you already know.

Watch Full Talk

Isabella Gottardi (Arm): From Edge to Everywhere: Arm Machine Learning Inference Advisor's journey from NPU to GPU and beyond.

Isabella from Arm focused on the hardware layer with a live demo of MLIA.

When Arm first became involved in machine learning, the assumption was cloud-centric: send an image to the cloud, run inference, return the result. Today, inference runs in lots of places. Cloud platforms. IoT devices. Phones. Automotive systems. Dedicated accelerators. The same model can technically run across all of them. Performance varies wildly.

“Portability of models does not mean portability of performance.”

That’s where MLIA, the Machine Learning Inference Advisor, comes in. A way of checking compatibility and performance before you’ve committed to architectural decisions that are expensive to reverse.

The usual flow is familiar: gather data, train, optimise, deploy. Performance issues tend to show up late, when changing course is painful. MLIA shifts that discovery earlier, while trade-offs are still cheap.

Isabella showed this live. Pick a target environment. Check whether the model will run. Look at how it behaves. Try a variant. Compare the results. The tool gives you information before you’ve committed to an approach that won’t work.

With Arm’s recently announced neurotechnology, that problem only gets more interesting. Models will behave differently on different GPUs, and especially on specialised edge device hardware. The future is heterogeneous systems, where performance depends on memory layout, data movement, and how work is split across processors.

The takeaway: inference performance isn’t something you discover at the end. It’s a design input.

Watch Full Talk

Sources:

https://github.com/arm/mlia

Pypi package:

https://pypi.org/project/mlia/

Rounding up

What we’re taking away from MLOps.WTF #7:

Edge forces immediate answers to questions you could defer in the cloud. Energy use, model size, and bandwidth constraints aren’t problems for later. They’re design constraints from day one.
Operational maturity matters more than hardware capability. The limiting factor isn’t compute power. It’s whether you can observe, update, and recover when systems are in the field.
Be selective about what moves. Not everything belongs at the edge. Move what benefits from immediacy and local processing. Keep what needs global context or doesn’t gain from faster turnaround.
Design for hardware reality early. The same model performs differently across different chips. Check compatibility and performance before committing, while changing course is still cheap.

As Raj put it, edge is something you arrive at when your infrastructure can support it.

About Fuzzy Labs

We’re Fuzzy Labs. Manchester-rooted open-source MLOps consultancy, founded in 2019. Helping organisations build and productionise AI systems they genuinely own.

We’re also hiring.

Open Roles

See all current vacancies

Liked this? Forward it to someone wrestling with edge deployments.

Not subscribed yet? Button link below. Couldn’t be easier.

MLOps.WTF by Fuzzy Labs

Discussion about this post

Ready for more?