September 2023
From: Brian and Tobias
Subject: The CUDA moat & GPU providers
If you’ve been following this newsletter, you know we’ve gotten immersed in AI hardware. In March, we wrote a primer on NVIDIA and why we believed in specialized hardware for AI.
Around this time, we invested in Etched, an ASIC focused on LLM transformer inference. Since then, a lot has changed, but our position that NVIDIA is vulnerable has remained. This claim may seem dubious – NVIDIA is arguably the best positioned company on the planet for an impending AI boom. However, NVIDIA’s empire is held together not by chips, but by software. CUDA is its true source of moat. We think there are existential risks to CUDA that haven’t existed until now, driven by a massive GPU bottleneck that will get significantly worse, even if only a fraction of the AI boom comes to fruition.
Chat GPT was an “iPhone moment” for AI and started momentum around Gen AI productionalization and commercialization. However, we’re not close to where we are going – a tsunami is coming once incumbents figure out how to leverage AI for the right use cases and infra tooling is more mature. Despite the GPU bottleneck, lots of hardware actually is available, but supply is artificially constrained by CUDA’s dominance and hardware lock-in to NVIDIA GPUs. Long term, this is a threat to NVIDIA. Let’s dive in.
What is CUDA in the first place?
CUDA is a software development framework and language for writing programs that run on NVIDIA GPUs. CUDA automatically parallelizes compute to make the most of a GPU for developers, along with acting as a compiler for developers writing in an ML framework like Pytorch. Put another way, ML engineers learn how to develop in Pytorch, but the CUDA compiler translates that Pytorch into something the hardware can understand. Experts universally identify CUDA as the clear moat for NVIDIA – the part of the business that provides a competitive advantage against competitors.
Surprisingly, in head-to-head comparisons, AMD GPUs are just as performant and actually less expensive than NVIDIA GPUs. Given the discrepancy in market and mind share of these two companies in the AI world, that is a shocking reality. NVIDIA is borderline monopolistic in this market despite others having a hardware product that is just as good.
Data center revenues, NVIDIA vs. AMD:
An unsustainable status quo
For AI workloads, NVIDIA is the only game in town. As demand has risen for AI compute, however, this status quo has become less sustainable. We hear all the time about GPU shortages. A few recent examples:
One more data point that demonstrates this reality is that rapidly growing companies are emerging with the sole purpose of making GPU access easier and more efficient than going through hyperscalers. The growth of companies like Coreweave and Lambda Labs into the nine-figures of ARR in almost no time at all has been remarkable, and a great example of customers’ dire need to access compute capabilities however possible. In addition to those main players, there is a long tail of smaller GPU clouds and “serverless” GPU providers. Below is a list of some:
Importantly, NVIDIA is supporting Coreweave and Lambda Labs by giving them access to steeply discounted GPUs. Why would NVIDIA do this? It’s a strategic move to build out cloud capabilities for GPUs, to fend off the hyperscalers that are all developing hardware to compete with NVIDIA. NVIDIA understands its supremacy is dependent on its software offering, and the partnership with the GPU clouds shows that.
Additionally, we’re seeing signs that the industry is promoting more parity between NVIDIA and AMD. In October, Meta launched an open source project to ease switching back and forth between NVIDIA and AMD chips. We’ve also heard through the grapevine that some of the biggest tech players are pre-ordering AMD’s GPUs for next year out of a necessity for more compute capacity.
These moves are no surprise when considering Meta’s spend with NVIDIA. It took Meta 2048 A100 GPUs to train LlaMa, and with A100s at $15k apiece, Meta likely spent ~$30M on training alone. This does not include networking or any of the work to actually hook up these systems to run in the right way, a meaningful expense. Additionally, Meta’s research supercomputer includes 2,000 DGX A100s. The DGX A100 system costs ~$200k, meaning Meta’s supercomputer costs at least $400M, not even including the cost of inference.
The path to something better
Demand for AI compute is overextended, and NVIDIA’s monopoly is more a function of software lock-in than hardware supremacy. We believe the pieces are finally in place to make this monopoly vulnerable:
CUDA lags at compiling for a smaller subset of tasks:
We were having a conversation recently with a senior researcher at a major AI chips company who has worked at AI chips startups of various shapes and sizes over the last decade. One metric he raised was “percent of models on Hugging Face that can be successfully compiled.” His point of view was that this metric should become a standard in the AI hardware/software ecosystem, and that CUDA’s score could be eclipsed by other frameworks.
What does this mean? CUDA is great at being able to take in a variety of different instruction sets and have them run on an NVIDIA GPU, but it’s not as great at optimizing for a more specific set of models/instructions. In this case, the Hugging Face library is a narrower (but very important) set of models for which CUDA might underperform.
Demand may be sufficient for these types of models:
That NVIDIA deficiency – trading off specialized performance gains for generalized usability – doesn’t matter if no specialized use case has sufficient demand. If companies find enough valuable use cases for LLMs, we will finally be at a moment with transformers where this sufficient demand exists, where companies are willing to move away from CUDA to more specialized frameworks because the risk or inconvenience of moving is outweighed by the need for hardware access.
What we’ve written about above – the GPU bottleneck and efforts to prop up AMD – point to this demand existing. If you believe in AI becoming ubiquitous and changing software as we know it, this impending demand is hard to deny.
Serious projects underway:
There are projects gaining momentum that try to make different types of hardware more accessible, such as Modular and Lemurian Labs. Many industry veterans are rallying behind projects like this, especially Modular, which is another good data point around CUDA’s vulnerability.
With that said, not every entrepreneur can be Chris Lattner starting Modular, raising boatloads of money from the get-go based on incredible industry experience and credibility. To solve the GPU bottleneck, there needs to be much more supply, and accessing non-NVIDIA chips needs to be way easier. We think that rather than creating a new CUDA, the likeliest way this materializes is by entrepreneurs intelligently going around CUDA.
As one Founder recently told us, “Breaking into the CUDA fortress is super hard. Many have tried and failed. Trying to replace CUDA is suicide. The right approach is going around CUDA. People who use LLMs don't use CUDA directly - they use a layer on top. Replacing these layers on top of CUDA is way easier.”
There will be many startup attempts, and we’re excited about backing founders in this space. Founders we’d find particularly compelling include engineers who have worked in and around compilers and business people who know the semis industry inside and out.
As always, please share feedback and thoughts. We’d love to hear from you! And if you have colleagues or friends you think would enjoy this newsletter, please feel free to introduce them to us.
Until next time,
Brian and Tobias