Ahoy there 🚢,
Matt Squire here, CTO and co-founder of Fuzzy Labs, and this is the 2nd edition of MLOps.WTF, a fortnightly newsletter where I discuss topics in Machine Learning, AI, and MLOps.
Super-intelligent aliens, electronic brains, and the end of GPUs
I believe there’s a non-trivial chance that the Hungarian mathematician and physicist John von Neumann was actually a super-intelligent alien sent here to study our species. I just can’t convince any historians to take me seriously.
Among his many revolutionary ideas was the blueprint for computer design that is still used today, largely unchanged for 80 years. In the von Neumann architecture, there’s a random-access memory in which programs and data are stored, and a processor that executes instructions. These instructions represent different kinds of operation, like adding two numbers, branching, or reading (writing) to (from) memory.
Perhaps you’re thinking that’s an obvious idea. But that’s only because this is the model we’re all used to now: no matter what programming language we use, it all reduces down to simple instruction sequences that get loaded into memory and executed by a processor.
This architecture is universal, which means it can run any program you can think of. From games, to advanced text editors (I obviously mean Emacs), to large language models, it’s all possible. And we can build enough abstraction layers that when you run a Hugging Face pipeline, you don’t have to think about the insane machinations necessary to make that work on the hardware.
Yet despite its flexibility, it isn’t always the most efficient. If you want to search a database index, it’s perfect, but if you want to train a neural network, it turns out to be a very poor choice. The problem lies in the strong separation of state (memory) and computation, making training and inference strongly IO-bound. In nature, there’s no such bottleneck: every neuron contains its own state and processing capability, so instead of one big processor with one block of memory, we have billions of small processors, each with its own internal memory.
Essentially we’ve borrowed an information processing paradigm from nature, and forced it onto an entirely unsuitable architecture.
This all makes sense when you consider energy use. While ChatGPT needs a nation state’s worth of power, a bowl of porridge in the morning is enough for your brain to outperform it at most tasks.
For this reason, many people have asked whether the neural architecture found in nature can be replicated in silicon, potentially unlocking more speed and scale, with lower energy consumption. This so-called neuromorphic hardware might just be what renders GPUs obsolete for AI.
But wait, don’t rip that RTX4090 out of your PC yet! Despite some promising results, we’re still a way out from true commercial viability.
One offering comes from Intel, whose Loihi chip presents 128 ‘neuromorphic cores’. Originally hosting 1024 neurons per core, this increased to 8192 with Loihi 2. Intel recently demonstrated how far Loihi can scale with Hala Point, which combines 1152 Loihi 2 chips to simulate a total of 1.15 billion neurons.
Another notable project is SpiNNaker, a research project from the University of Manchester aiming to simulate brain-like architectures, with applications in robotics, edge computing, and neuroscience research. In its current form it consists of ~1 million ARM processor cores, taking up an entire room filled with server racks.
While the larger-scale demonstrations like Hala Point and SpiNNaker aren’t really practical outside of the lab, at the lower end of the scale, something like a Loihi chip is certainly viable for edge applications, where models are small and reducing power consumption is important.
Now, since this is an MLOps newsletter, it would be remiss of me not to discuss the tooling ecosystem. The Open Neuromorphic Project aims to bring together collaborators across academia and industry to build open source tools, platforms, and standards for running models on neuromorphic hardware. They maintain a list of frameworks for model training and deployment.
To finish off, there is one important detail that I’ve missed out: these processors are designed to run a particular flavour of network, known as a spiking neural network, which happens to be a very different approach to the one we’re all familiar with, meaning there’s no straightforward way to just run Mistral on this strange new hardware.
And while there is some work to bridge transformer models with spiking networks, what’s really on offer with neuromorphic computing is an entirely new way of looking at machine learning, one where our models can continuously learn and adapt, and where the lines between inference and training become fuzzy.
What’s clear is that the paradigm followed by the current generation of large language models is power hungry and data hungry in a way that limits our ability to continue improving through scale alone. Whether neuromorphic computing offers an alternative paradigm free from these constraints remains to be seen.
And finally
Assorted things of interest
You may remember the furore a couple of years ago, when an AI generated image won an art competition at the Colorado State Fair. Well, last week, justice was restored, when Miles Astray won an AI photo competition with a genuine photo of a seemingly headless flamingo. Of course, after coming clean, Astray’s photo was disqualified from the competition. While he has no intention of appealing the decision, I imagine that if he did, he wouldn’t have a leg to stand on.
If you liked Martin Scorcese’s 3.5 hour long masterpiece The Irishman, but were disappointed that it teaches you nothing about the current state of AI, then Andrej Karpathy has you covered. In his new 4 hour epic “Let's reproduce GPT-2”, he covers the end-to-end process of building an LLM from scratch. Grab your popcorn and your warm up your GPU 🍿
Thanks for reading!
Matt
About Matt
Matt Squire is a human being (we swear), programmer, and tech nerd who likes AI and MLOps. Matt enjoys unusual programming languages, dabbling with hardware, and computer science esoterica. He’s the CTO and co-founder of Fuzzy Labs, an MLOps company based in the UK.
Each edition of the MLOps.WTF newsletter is a deep dive into a certain topic relating to productionising machine learning. If you’d like to suggest a topic, drop us an email!