January 2025
From: Brian, Tobias, and Gaby
Subject: The implications of DeepSeek
Hi all,
Welcome to the first 2025 edition of the BT&G newsletter.
We’re living in a moment of peak complexity. As mere mortals in this era of unprecedented turbulence, rapid change, and magical technological progress—where the entire economy and perhaps fate of humanity revolves around a singular theme—presuming to grasp the nature of the moment feels preposterous. But we call agree – DeepSeek was a really, really big deal.
Let’s explore the potential ramifications.
Marc Andreessen described the DeepSeek release of V3 as AI’s Sputnik moment, and we’re in an epic AI race with China. Sam Altman conceded that it put OpenAI on the wrong side of history by choosing to build closed-source instead of open source models. Satya Nadella banked on Jevon’s paradox to reframe the challenge in his favor. And Zuckerberg doubled down on capex build outs. This release resulted in a drop of epic proportions, wiping out $1T in market cap in a day, including nearly $600B from NVIDIA alone.
All of this panic stemmed from DeepSeek’s cost efficiency in getting to a model on par with OpenAI’s best, despite (apparently) many more capital and hardware constraints. And all of this coming from a Chinese company! At face value, this development calls into question the massive CapEx expenditures that the hyperscalers and OpenAI have been chest-pumping about. While we think this issue is important, it was overstated, misinterpreted and is less significant than other factors, which are fantastic for technological progress and startups.
As VCs, we bet on competition and innovation, and DeepSeek intensified the former with an impressive display of the later. This is good for various reasons:
DeepSeek is accelerating the AI revolution, and we want to understand the implications for startups.
Some background first
In 2015, Liang Wenfeng founded High-Flyer, a Chinese AI-powered hedge fund. Wenfeng was forward-looking and stockpiled NVIDIA A100s before the U.S. would restrict Chinese access to AI hardware. In 2023, Wenfeng launched DeepSeek, backed by High-Flyer, to build a research lab dedicated to pushing the frontier of AI, with applications beyond the hedge fund.
Wenfeng’s approach to building DeepSeek is a stark difference to U.S. AI labs. Their core tech team is full of recent grads from university, compared to experienced AI researchers. Wenfeng indexes on creativity and passion, versus raw technical talent, stating (in a translated interview) that “innovation often arises spontaneously, not through deliberate arrangement, nor can it be taught.”
In December 2024, DeepSeek released V3 alongside a bombshell technical report that claimed that V3’s training costs were just $5.6M. For comparison, Meta’s Llama 3 405B model took 30.8M GPU hours on SOTA hardware, costing 6x-10x more than DeepSeek’s V3. Then, in January, DeepSeek released R1, meant to compete with OpenAI’s o1. R1 is 23x more cost-effective and 2.4x faster than o1, while being essentially on par from a performance standpoint.
This technical accomplishment is breathtaking considering the capital and hardware constraints on DeepSeek. To accomplish this, there were several critical technical breakthroughs DeepSeek achieved. To name only a few of them:
In the wake of o1, many compared DeepSeek’s purported $5.6M training costs to the hundreds of millions or billions that OpenAI has spent. However, this is not apples-to-apples. DeepSeek has also spent a lot on staff, hardware, prior training runs, etc. A more apples-to-apples comparison is across final training runs, which still demonstrates how impressive DeepSeek is, although the gap is not as wide as some initially believed. Dylan Patel from SemiAnalysis has also said he believes total server CapEx for DeepSeek is ~$1.3B.
Below are some summary comparisons between DeepSeek and the U.S. models:
Why this is a big deal
Fundamentally, DeepSeek calls into question the Bitter Lesson – the idea that “general methods that leverage computation are ultimately the most effective”, meaning more money = more compute = better performance. The Bitter Lesson teaches us that more compute will always beat clever engineering and algorithmic techniques. The AI industry had coalesced around this belief – look at Stargate and other planned CapEx spend from the hyperscalers. These investments, which have already had to be justified to Wall Street, are now being called into question and we’re left to wonder whether CapEx spend is the proper use of energy and capital.
Overall, we see this as a massive win for (1) open source and (2) inference. What began in 2024 around the need to scale and optimize inference workloads is now primed to explode in 2025. As a contribution to technology, it is undoubtedly an amazing breakthrough, and will speed up the rate at which AI gets deployed and improves people’s lives. Imagine minimal inference costs, and no tax paid to a model provider like OpenAI. Innovation, experimentation, and as a result, inference, will skyrocket. Any investments in 2025 that do not have open source and inference at the core now seem to be missing the point of everything happening in AI.
The geopolitics of DeepSeek coming from China are complicated to say the least, especially if DeepSeek was able to distill OpenAI’s o1 model, which evidence suggests they did. Because it is open weight, we hope there are no restrictions placed on developers’ abilities to use the models, although DeepSeek as a consumer application could see scrutiny similar to TikTok. We do not have the space to do justice to the geopolitical considerations at play here, other than to say that hopefully DeepSeek remains a tool in the toolbox for builders everywhere.
How far we can get with novel AI techniques is now a re-opened question that impacts how we think about compute scaling. But if models improve more rapidly as a result of this development, more flowers will bloom than will die. We have faith in the clarity and simplicity of Jevon’s paradox – let’s invest behind scaling inference and delivering AI to all.
With that in mind, we have thoughts about how DeepSeek impacts the layers of the stack, especially for startup. :
DeepSeek and the AI stack
Apps:
Companies at the app layer are the clearest winners from DeepSeek, on a few fronts. First, DeepSeek is open source – app developers benefit from a rich OS community that keeps pace with closed-source models because this enables fine-tuning and customization of LLMs for specific use cases. The diversity and sophistication of apps that developers can build via OS is much greater than for models like GPT. This should help the startup ecosystem!
Second, DeepSeek is a really good model, not just from a reasoning perspective but from an efficiency perspective – specifically cost. Developers now have access to a much lower cost, performant reasoning model that can either (a) increase gross margins of their product or (b) enable them to price lower.
And third, because of the efficiency of DeepSeek, local hosting and model deployment is now increasingly possible. Holding back the proliferation of incredible consumer AI apps is the limitation of running these apps on laptops and phones. DeepSeek makes this more possible and will encourage on-device app development.
Infra/middleware:
As we wrote about in our end of year note, we believe that AI infra will go through a second wave, where new tools emerge as more AI apps scale. To the extent that DeepSeek helps builders distribute better AI apps, this speeds up growth of the app ecosystem and therefore also speeds up new, useful infra tooling hitting the market.
Additionally, infrastructure companies will benefit from the victory of open source – closed source models will have less leverage to vertically integrate and move into the infrastructure layer. As a result, open source means a more fragmented stack and the increased prominence of standalone infra businesses.
Compute:
The compute layer is the most complicated one to evaluate. NVIDIA lost ~20% of its market cap in response to the DeepSeek news. If top-of-the-line models no longer require as much compute to train, then the demand for training compute should go down, right? And, if inference is much less expensive and more efficient, we also wouldn’t need as much inference compute either, right?
Although these risks exist in the short-term, there are several specifics of this news that make the picture less clear:
Models:
DeepSeek is a net negative for the model developers, especially OpenAI, Anthropic, and Google, all of which have made bets on closed source and seemingly infinite CapEx scaling. This is particularly humiliating, given the resource constraints placed on DeepSeek relative to the infinitely capitalized OpenAI. Sam Altman even said in a Reddit AMA in the wake of all this, “I personally think we have been on the wrong side of history here and need to figure out a different open source strategy…” There is a total reevaluation of the OpenAI strategy coming in response to DeepSeek.
DeepSeek also impacts how the closed-source model companies will think about their API businesses. Do they want to continue investing there and expose their IP to distillation? Is the CapEx worth it if the end product can be easily copied? Naveen Rao, CEO and Co-Founder of MosaicML, thinks the future will only include open source model APIs.
On the other hand, we think Meta is a big winner. Meta is making a bet on cheap inference and the supremacy of open source – it is an added bonus if the best open source models happen to be their own, but they do not have to be in order for Meta to accomplish this goal. What Meta wants is a commoditization of the model layer, and we have long wondered whether and when this would happen. DeepSeek is a rapid, seemingly overnight commoditization of that layer, and even if OpenAI comes out with a better model next week, the question will now always be when not if open source mimics or surpasses the current state-of-the-art.
The idea that we need to pour more money into building datacenters is now a legitimate open question. However, the importance of open source and the need to build infrastructure that enables inference to scale seem like more sure bets than ever. We have made two investments that fit this narrative – Etched (at the hardware layer) and an unannounced memory-focused startup. In 2025, we will double down here, and are on the lookout for similarly oriented startups.
We all need to revisit our thinking a bit on the AI market now, but that’s a good thing. Builders, innovation, democratization, and production AI win. The hyperscalers, over-funded LLM businesses, and $500B datacenter projects lose. That’s likely a good trade.
As always, if you have any thoughts or feedback, please let us know.
Until next time,
B, T, & G