NVIDIA and the battle for the future of AI chips

An AI chip is any processor that has been optimised to run machine learning workloads, via programming frameworks such as Google’s TensorFlow and Facebook’s PyTorch. AI chips don’t necessarily do all the work when training or running a deep-learning model, but operate as accelerators by quickly churning through the most intense workloads. For example, NVIDIA’s AI-system-in-a-box, the DGX A100, uses eight of its own A100 “Ampere” GPUs as accelerators, but also features a 128-core AMD CPU.
AI isn’t new, but we previously lacked the computing power to make deep learning models possible, leaving researchers waiting on the hardware to catch up to their ideas. “GPUs came in and opened the doors,” says Rodrigo Liang, co-founder and CEO of SambaNova, another startup making AI chips.
In 2012, a researcher at the University of Toronto, Alex Krizhevsky, walloped other competitors in the annual ImageNet computer vision challenge, which pits researchers against each other to develop algorithms that can identify images or objects within them. Krizhevsky used deep learning powered by GPUs to beat hand-coded efforts for the first time. By 2015, all the top results at ImageNet contests were using GPUs.

Deep learning research exploded. Offering 20x or more performance boosts, NVIDIA’s technology worked so well that when British chip startup Graphcore’s co-founders set up shop, they couldn’t get a meeting with investors. “What we heard from VCs was: ‘what’s AI?’” says co-founder and CTO Simon Knowles, recalling a trip to California to seek funding in 2015. “It was really surprising.” A few months later, at the beginning of 2016, that had all changed. “Then, everyone was hot for AI,” Knowles says. “However, they were not hot for chips.” A new chip architecture wasn’t deemed necessary; NVIDIA had the industry covered.

What’s in a name?
GPU, IPU, RPU – they’re all used to churn through datasets for deep learning, but the names do reflect differences in architecture. 

Sun LeeGraphcore’s Colossus MK2 IPU is massively parallel with processors operated independently, a technique called multiple instruction, multiple data. Software is written sequentially, but neural network algorithms need to do everything at once. To address this, one solution is to lay out all the data and its constraints, like declaring the structure of the problem, says Graphcore CTO Simon Knowles. It’s a graph – hence the name of his company.

But, in May 2016, Google changed everything, with what Cerebras’ Feldman calls a “swashbuckling strategic decision”, announcing it had developed its own chips for AI applications. These were called Tensor Processing Units (TPUs), and designed to work with the company’s TensorFlow machine learning programming framework. Knowles says the move sent a signal to investors that perhaps there was a market for new processor designs. “Suddenly all the VCs were like: where are those crazy Brits?” he says. Since then, Graphcore has raised $710 million (£515 million).
NVIDIA’s rivals argue that GPUs were designed for graphics rather than machine learning, and that though their massive processing capabilities mean they work better than CPUs for AI tasks, their market dominance has only lasted this long due to careful optimisation and complex layers of software. “NVIDIA has done a fabulous job hiding the complexity of a GPU,” says Graphcore co-founder and CEO Nigel Toon. “It works because of the software libraries they’ve created, the frameworks and the optimisations that allow the complexity to be hidden. It’s a really heavy lifting job that NVIDIA has undertaken there.”
But forget GPUs, the argument goes, and you might design an AI chip from scratch that has an entirely new architecture. There are plenty to choose from. Google’s TPUs are application-specific integrated circuits (ASICs), designed for specific workloads; Cerebras makes a Wafer-Scale Engine, a behemoth chip 56 times larger than any other; IBM and BrainChip make neuromorphic chips, modelled on the human brain; and Mythic and Graphcore both make Intelligence Processing Units (IPU), though their designs differ. There are plenty more.
But Cantazaro argues the many chips are simply variations of AI accelerators – the name given to any hardware that boosts AI. “We talk about a GPU or TPU or an IPU or whatever, but people get too attached to those letters,” he says. “We call our GPU that because of the history of what we’ve done… but the GPU has always been about accelerated computing, and the nature of the workloads people care about is in flux.”
Can anyone compete? NVIDIA dominates the core benchmark, MLPerf, which is the gold standard for deep-learning chips, though benchmarks are tricky beasts. Analyst Karl Freund of Cambrian AI Research notes that MLPerf, a benchmarking tool designed by academics and industry players including Google, is dominated by Google and NVIDIA, but that startups usually don’t bother to complete all of it because the costs of setting up a system are better spent elsewhere.
NVIDIA does bother – and annually bests Google’s TPU. “Google invented MLPerf to show how good their TPU was,” says Marc Hamilton, head of solutions architecture and engineering at NVIDIA “Jensen [Huang] said it would be really nice if we show Google every time they ran the MLPerf benchmark how our GPUs were just a little bit faster than the TPU.”
To ensure it came out on top for one version of the benchmark, NVIDIA upgraded an in-house supercomputer from 36 DGX boxes to a whopping 96. That required recabling the entire system. To do it quickly enough, they simply cut through the cables – which Hamilton says was about a million dollars worth of kit – and had new equipment shipped in. This may serve to highlight the bonkers behaviour driven by benchmarks, but it also inspired a redesign of DGX: the current-generation blocks can now be combined in groups of 20 without any rewiring.

Like this article?

Share on Facebook
Share on Twitter
Share on Linkdin
Share on Pinterest

Leave a comment

Why You Need A Website