Ahoy there đ˘â,
Matt Squire here, CTO and co-founder of Fuzzy Labs, and this is the 10th edition of MLOps.WTF, a newsletter where I discuss topics in Machine Learning, AI, and MLOps.
How do we define MLOps â is it just âDevOps for Machine Learningâ, or is there more to it?
A couple of weeks ago we hosted another MLOps.WTF meetup here in Manchester, and I opened the event with this question.
âLifecycle management of machine learning applicationsâ was one suggestion. While that sounds an awful lot like DevOps, in reality training and running ML models comes with challenges that arenât seen in the DevOps world. Take monitoring, for example: an âordinaryâ application, once deployed, stays put until we release a new version. But a modelâs correctness can change if the world around it changes.
For my answer, I go to the 2014 paper from Google Research, Machine Learning: The High-Interest Credit Card of Technical Debt. The authors highlight: the tight coupling of data, code, model, and environment; tracking the correctness of models over time; and the complexities involved in testing ML systems. And these surely need special tooling and expertise.
In production, weâre often faced with in-depth engineering challenges that impact scale, security, and safety. At this meetup, we covered three topics: first, Will Faithful talked about how to work with big graph data, then Dr Edoardo Manino covered the frankly terrifying notion of ML models deployed into safety-critical applications, and finally Thom Kirwan-Evans explained how COVID taught him to think in pipelines before models.
You can watch the highlights below, and you can read on for the full details from each talk.
Graph algorithms at scale
A lot of data is graph shaped.
Imagine youâve just been hired as a data scientist for a bank. Unfortunately, thereâs been a security incident, and your task is to understand precisely how many customers, accounts, and devices may be compromised. Customers can have multiple accounts, which they access from a variety of different devices.
If we think about our data as a graph â that is, a set of entities and relationships between them â then we can make some deductions. For instance, if one customer is compromised, then anybody else who accesses an account from the same device is also compromised.
Will Faithful, CEO of ExaDev, takes us through the engineering tradeoffs involved in choosing graphs over relational databases, how to process graphs efficiently, and training ML models from graph features.
You can watch the full video here:
Floating-point neural network safety
Ah, IEEE754, easily among my five favourite technical standards. Itâs the specification behind all modern implementations of floating-point numbers, as available from your nearest friendly Python/C/Rust/etc environment. Itâs the standard that gives us 0.1 + 0.2 == 0.30000000000000004, and âNaNâ (not a number).
Floating-point numbers arenât bad per say, but they are quite unintuitive, and often misunderstood by the programmers who use them. Sometimes, the results are disastrous, and that includes a missile defence system failing due to a rounding error, and the loss of the Ariane 5 rocket.
Neural networks tend to use floating-point weights, but we usually donât worry about what that means for stability. But imagine we want to use a neural net in a safety critical application: Edoardo Manino, a researcher from Manchester University, explored how feasible that really is, and the current state-of-the-art in tooling for ML model verification.
Check out the full video here:
You and whose data? Lessons in remote SecDevOps
Remember COVID? Masks, lockdowns, video calls⌠five years on, I think we can all agree it was a strange time.
While a lot of techies took up home working without much difficulty, our last speaker, Thom, was working at the time on a super secret government project. And the data he needed for model training existed in one single physical location, on an air-gapped system, presumably with armed guards outside.
Going to the office was out of the question, thanks to lockdown. So how can you train models from home when you donât have the data, and worse, youâre not allowed to have that data?
Thom Kirwan-Evans, co-founder at Origami Labs, realised two things: firstly, itâs possible to go a long way with synthetic data. Secondly, having a trained model isnât the outcome to focus on. Whatâs more useful is having a well-structured training pipeline that can take a dataset and produce a model, along with metrics, as output. Most of the valuable work can be done remotely, because the value is in the pipeline and the tooling that supports it.
You can watch the full video here:
See you at the next one!
Weâre building a community of like-minded people with a passion for production AI/ML here in Manchester. Our next event is on the 5th of June â you can sign up here.
Weâre always looking for more speakers too, so please get in touch if youâve got a story to tell about MLOps. Weâre keen to ensure a diverse set of voices can be heard too; Iâd love to hear from more female speakers and members of minority groups at future events.