Niklas Horlebein: The release of DeepSeek-R1 made quite an impression. But why? There are several open source LLMs. Is the impact because of its – allegedly – low model training costs?
Pieter Buteneers: There are a few special things about the DeepSeek AI release. The first is the fact that it’s very affordable. When using their reasoning model, you’ll pay about 10 times less than what you would pay at OpenAI. This came as a shock to the sector. So far, AI companies have poured billions of dollars into research and people could only buy in at a very high price. But now – apparently – new players are pushing for lower costs.
The other thing is, obviously: How were they able to train it with 6 million dollars, allegedly?
That’s only half of the story: It’s not just the 6-million-dollar figure. Basically, 6 million dollars is what brought them from having the V3 baseline model, with a simple question and an answer without reasoning, to improving it so that it can also reason about itself. It has become a reasoning model. Switching from one state to the other is what cost 6 million dollars.
Stay up to date
Learn more about MLCON
Stay up to date
Learn more about MLCON
That wasn’t much, as the V3 model already existed, so it wasn’t that expensive to train. I couldn’t find exact details on how much they paid for it, but allegedly, training it was also quite cheap.
What struck a lot of people was what’s called “distillation”. While this isn’t exactly the right term, the media picked up on it. Distillation is typically used with a large parent model to create a smaller version of that model that mimics its behavior. Sometimes, you succeed in receiving almost the same performance with the smaller model by using data generated by the larger one for training. That’s typically how distillation is used.
In this case, the term loosely applies to what DeepSeek AI did. They already had a base model that they wished to turn into a reasoning model. Using examples from OpenAI – and possibly others – to generate training data, they then trained their own model.
This made the process a lot cheaper, as they were able to piggyback on existing models and mimic them to achieve similar results, ensuring their model learned the same reasoning steps. Of course, this was a big shock. It showed how easy it has become to copy another player’s approach. This scares a lot of competitors.
The fourth thing is that they didn’t pay much – just 6 million to go from one model to another. However, Nvidia is not allowed to export their newest graphics cards to China, so China is forced to work with older hardware when training models. Apparently, they may have found a workaround, but in theory, they used Nvidia hardware one or two generations behind what European and American players are using, and yet still managed to achieve a model comparable to OpenAI.
The founder of Anthropic, Dario Amodei said in defense, “they’re a year behind.” I don’t think they’re actually a year behind, especially compared to OpenAI. Anthropic is a bit ahead. However, it shows that they’re not that far behind. A year in this industry is a long time, and a lot can change in that period. However, looking at the bigger picture: a new company coming out with a model like this and only being a year behind is incredible. It is insane how quickly they’ve managed to catch up, even with relatively modest resources, a smaller budget, and with older hardware.
Niklas Horlebein: Sounds like a good trade-off: To be just a year behind, but save hundreds of millions by doing so. Let’s stick with money: From your perspective, how realistic are DeepSeek’s claims about low training costs and reduced need for computing power?
Pieter Buteneers: It’s realistic if you can make your baseline model capable of reasoning. Then 6 million is pretty reasonable and achievable. But achieving a model that performs at least as well as their baseline model with that amount is very unlikely. They likely used other resources and the 6 million was mainly for switching to a reasoning model. It appears that’s the phase the money went into: taking a question-answer model and making it reason about itself before providing an answer.
But obviously, it’s marketing – really, really good marketing – especially at a time when OpenAI and Anthropic have been raising billions in new capital. They have shown off that they can achieve something similar with amounts of money two orders of magnitude lower. This is an obvious shockwave, and they will certainly reach the same result with far fewer dollars. They exaggerated a bit by not mentioning the full picture, but still, it’s not something to be underestimated.
Niklas Horlebein: If it’s true that you can train a model for a much smaller amount of money, why exactly did or do competitors need so much capital? Was much of the budget for marketing?
Pieter Buteneers: When it comes to building a model and all of the steps required to get there, you can optimize the cost and learn from mistakes others made to go from A to B. But if you’re on the front row and pushing the envelope, those mistakes have not been made yet. You need a budget to make those mistakes. That’s the disadvantage of being a front-runner – you don’t know what’s coming or which hurdles you might encounter. This requires a bigger budget.
To use OpenAI as an example, they’ve been trying to release GPT-5 for months. Theoretically, a massive training run for GPT-5 was finished in December, although we do not know all of the details.
However, because running the model was so expensive, and because it only performed slightly better than GPT-4, they have released it as GPT-4.5 by now. So, it wasn’t worth it from a price-performance standpoint. They must completely rethink how they train models.
That process costs money. That’s the price you pay to be first, to be the front-runner. It’s crazy to see how a small Chinese player is only one year behind. It shows they’re catching up really, really quickly.
Niklas Horlebein: Is it really one year behind? What can you tell us about R1’s performance after a few weeks? How does it compare to other major LLMs?
Pieter Buteneers: If you compare it’s non-reasoning performance, it performs slightly below GPT-4o, but above Llama 3.3. In my opinion, it is the best-performing open source model at the moment. And as a bonus, it is the first reasoning model that is also open source. So, on the open source side, DeepSeek AI is at the bleeding edge of what you can get.
When comparing it to other closed source reasoning models, OpenAI’s o1 and o3 mini models are maybe a tiny bit better. Anthropic is again quite a few steps ahead. So, DeepSeek AI is not doing bad at all; it’s actually pretty good. And, in my opinion, it’s at most half a year behind.
Niklas Horlebein: DeepSeek’s models are “open weight”, not fully open source. What should developers know about the difference?
Pieter Buteneers: Most models are open weight, including Meta’s. This simply means that you can download the model to your computer, and you’re free to run it locally and modify it as much as you want, depending on the license. However, you don’t know what data the model has been trained on, nor do you know what training algorithm, setup, or procedure was used to train the model.
Essentially, they are an even greater black box than traditional AI models you train yourself. You only have the weights you can use to let it do tasks for you and you receive an output. But if you don’t know what the training data is, you do not know what the model’s knowledge base is. This limits your ability to inspect and learn from the model. You must reverse-engineer it, in essence, because this black-box system makes it very hard to learn what it’s really on the inside.
For example, with DeepSeek AI, you can see some moral judgments were made during the training process. If you write a prompt about Tiananmen Square or Xi Jinping, the model won’t provide an answer. It will try to avoid the topic and redirect the conversation elsewhere. This is because, in China, the law – or possibly culture – simply forbids discussion on these topics. It’s a moral judgment the creators made: if they want the model to survive in China, they must ensure it doesn’t say anything about these subjects.
However, all large language model providers make these decisions. Even Grok 3, which claims to be the snarkiest model and claims to say things as they are, will not tell you that Elon Musk is making mistakes or something similar. It’s trained not to.
They’ve inserted a moral judgment into it. None of these models will tell you how to buy unlicensed firearms on the internet. These topics have been removed, even though the data may exist somewhere on the internet and is likely in the training data.
The creators of these models make moral judgments about what is socially acceptable but you can only do that well by influencing the training process. However, with an open weight model, these decisions have already been made for you. You have no impact on how these models behave from a moral standpoint.
Niklas Horlebein: Would fully open source models allow for that?
Pieter Buteneers: Yes. There’s still research to be done, but at least all the information is on the table.
With fully open source models, you have access to the training scripts and data and can make your own moral judgments. You can tweak the model so that it behaves in a way that aligns with what you believe. Standards like “do not kill other humans” are universally accepted, but every country or region has its own specific things that you’re not supposed to say.
I’m Belgian, and in Belgium, it’s okay to joke about Germans in the Second World War. But you probably wouldn’t do that in Germany, right? It’s not socially accepted. What’s morally acceptable really depends on the country and culture.
You can try making the model very generalist and avoid the subject of Nazis in general, but you don’t know what each country considers okay or not. Letting big organizations like DeepSeek AI, OpenAI, Anthropic, or Mistral make those decisions limits our societal impact on what these models – and possibly, in the future, a form of AGI (Artificial General Intelligence) – will deem acceptable.
That is why, in my opinion, it’s crucial to understand the inner workings of these models. You need the data, and you need to control the training process yourself to get the desired output.
This way, society can learn and understand how these models work and defend itself against the potential risks these models pose. As long as models are not open source, we as a society have very limited control over what goes in and what does not.
Niklas Horlebein: We’ve already seen the short-term impact of DeepSeek AI. From your perspective, what will be its long-term implications for the industry? Will we see more open source competitors now that training models may be more financially plausible?
Pieter Buteneers: Their claim to train models on a comparatively shoestring budget will encourage other players to consider doing something similar.
The fact that the DeepSeek AI models are open source will attract developers and researchers to build on top. Meta understood that well. They were one of the first to make their biggest and smartest models open source. Mistral and others tried as well, but usually with smaller models.
More and more players will feel the pressure to open source at least the weights, and I hope it will become a competition about who can be the most open source. There are already discussions about other models and releasing parts of their training data, so you can already watch the “one-up” game begin. The player who releases the most open source model – one that includes everything – will form the baseline for much of the research that follows, as it allows others to take that model and continue building on it.
If a model is released fully open source with everything included, everything society builds on top of will be able to be downloaded from the internet as code and be integrated into models.
One of the reasons Meta decided to open source their models was to form the foundation for research on top of the Llama models and to be the de facto standard in LLM research going forward.
There’s also the fact that open sourcing attracts talent. Very few people want to work day and night, filling another billionaire’s pockets. Without making a lot of money themselves, without being allowed to talk about it. That’s another reason companies are considering open source: it attracts the talent they need.
You saw this when OpenAI became the de facto “ClosedAI”: a lot of people left the company and looked elsewhere. Now, they’re big enough to still attract talent, but for smaller player open sourcing is a way to make a difference. You’ll see more open source models going forward, as many people are asking for.
Open weight is the bare minimum. With that, the model can run anywhere. The training data might be more academic, so clients aren’t necessarily asking for it. But when open sourced, it creates a richer community around the research and the results of that research will be much easier to integrate back into the original model. This will be a competetive advantage for open source models.
Niklas Horlebein: Large U.S. companies seemed shaken by the unexpected competitor, but quickly stated they would stick to their approach of investing heavily in AI development. Do you think they will maintain this stance, or will OpenAI change its plans?
Pieter Buteneers: In the beginning, after DeepSeek AI was released, there was a bit of silence and shares dropped. People thought that perhaps models don’t need so much money. But a few weeks later, big players like OpenAI and Anthropic were raising massive amounts of capital. Being at the leading edge of this technology still requires huge amounts of money. That’s the sad part.
This is true throughout history: there are always innovators who launch something at a very high cost, and later, copycats try to mimic and replicate it at a much lower cost. But they’re different players, and they attract a different audience. Being on the cutting-edge attracts customers who want the most powerful and easiest-to-integrate solutions. OpenAI is still the best at that. Their models might not be the most performant compared to Anthropic, but they’re easier to integrate into code.
The answers are structured better; they’re consistently improving over time. ChatGPT is much better at formatting tables and structuring it’s output than only a few months ago. That continuous improvement makes them more snackable, and easier to use for both users and developers. All of this requires investment.
Copycats replicate something to achieve a similar benchmark performance, but it does not also mean that the product is easy to build with or pleasant to use. It’s simply good based on the benchmarks.
It’s like looking at the Nürburgring and putting a race car on it. You may have a car that can lap the Nürburgring really fast – but only a professional driver can drive it.
For regular people on the road, it doesn’t change anything – because they can’t drive it, and it has no worth to them. It’s only good in the benchmarks. That’s the risk with these models: they score well on benchmarks but aren’t pleasant to use.
That’s where OpenAI, and more specifically the knowledge Sam Altman brought from Y Combinator, really comes into play. The focus is on user experience.
That edge is much harder to mimic. It takes many small iterations to achieve a model output that is meaningfully better.
Stay up to date
Learn more about MLCON
Niklas Horlebein: How do you feel about the “rise of DeepSeek AI” in general? Do you believe it will have a positive impact on AI development – or a negative one?
Pieter Buteneers: Generally, more competition is good for the consumer, and in this case, I’m a consumer. I use this technology to do legal due diligence for lawyers. Being able to use this technology from multiple companies and letting them compete on price and performance is amazing. The only ones that suffer are the companies themselves as they try to one-up each other, but that’s always good for everyone else.
Although, it’s sad that it’s a Chinese player and you can’t get it hosted within the EU zone for a reasonable cost, like on Azure or AWS – or at least I haven’t found it yet.
This may change in the future, and once it does, it will become more accessible for GDPR enthusiasts, too. We can use this technology in our solutions with the guarantee that the Chinese government isn’t watching.
Let these companies compete, let them fight it out. The more players working towards AGI, the more checks and balances there are. In an ideal scenario, everything will be fully open source with checks and balances in place. It’s good that there’s an extra player in the field, helping create checks and balances. Yes, indeed.
Niklas Horlebein: Is there anything else about DeepSeek AI or the current LLM landscape that you would like to add?
Pieter Buteneers: Reasoning models are very interesting to me, personally, and the fact that DeepSeek AI has a reasoning model is an important breakthrough. That shouldn’t be underestimated, but the uptake of reasoning models in practical applications is still low. It’s easy in a chat situation, where users ask a question and let the model do the work. Their reasoning is helpful in some cases with difficult questions, though not in most cases.
At Emma Legal we automate the legal due diligence, which is fairly structured so we know very well what to check for and how to reason. So, we have our own dedicated checks and balances in place to ensure that the models aren’t hallucinating and that documents it pulls are relevant for the due diligence.
So in well-built AI applications, reasoning models aren’t often used, as you can already determine what kind of reasoning you need beforehand and build it yourself at a much lower cost.