DeepSeek R1: The Revolutionary AI Model?

In a shocking turn of events, Deepseek may have just achieved something absolutely impossible. If you are not familiar with this company, Deepseek is an innovative company from China that has literally taken the world of artificial intelligence by storm. Today, in the latest update to its R1 thinking model, this model is even better than some of the most advanced models.

When looking at these benchmarks, it is important to note that Deepseek has actually released two models. They have always had the standard chat model, which is Deepseek V3, and then they had the Deepseek R1 variant, which is essentially a thinking version of the model. Here, the model thinks about its thought process and the whole plan before sending the final response.

Deepseek R1, the latest update, is literally on par with state-of-the-art models like Gemini 2.5 Pro and OpenAI’s 03. We can see that it is toe-to-toe with these state-of-the-art models. The thing that makes this absolutely incredible is that this model was trained with apparently just $6 million. There have been various different individuals who refute that claim, but the point is that they’ve had to do this on a budget and a lot less time. How on earth has a company managed to catch up to state-of-the-art companies spending billions of dollars on inference for just a fraction of the cost?

Performance Analysis and Market Impact

When you take a look at the artificial analysis intelligence index, this is a really good indicator of where the model lies because it actually incorporates seven evaluations, not just one. Unlike other evaluations that look at maybe just one factor, this is something that looks at all the benchmarks and then aggregates an average score. The Deep Seek R1 jump has been absolutely incredible. It leapfrogs Claude 4 Sonic thinking, Quen 3 Reasoning, Gemini 2.5 Flash, Grock 3 Reasoning, and Gemini 2.5 Pro Preview. That is absolutely incredible.

This means that some people could argue that now Deep Seek are currently just behind OpenAI in terms of model quality. That’s a major statement when we consider just how much time and effort has gone into making these models. This is an incredible surprise, especially considering that this for Deepseek was just an update. If this was their next Frontier model, which is going to be Deepseek R2, then this kind of performance would still be rather impressive. But the fact that they’ve just updated R1 to be this effective is quite concerning because it leads me to believe that maybe these tech companies aren’t in the lead when it comes to state-of-the-art AI performance.

One benchmark to really focus on is the ADA Polyglot score. Someone posted on the local llama subreddit that Deepseek R1 scored the same as Claude 4 Opus on Adah Polygon at a whopping 70%. The crazy thing about this is that we know that these frontier models from large tech companies often cost quite a lot. This actually cost around $2 to $3 to run, whereas models like Claude Opus cost probably around $50 or so for the same kind of inference.

Cost Efficiency Revolution

Remember, it’s not just about benchmark performance. We have to understand that whilst most people will flock to the very best AI in the world, it’s probably different in reality because we always have to factor in the cost of the model. This Deep Seek R1 model is completely stunning, and the crazy thing is that it costs just a fraction of what these other large language models cost.

When we look at Deepseek R1 compared to other Frontier models, there is a clear difference in terms of pricing. Claude 4 Opus is at around $75 for an output and around $15 for their input per 1 million tokens, whereas Deepseek is around $55 and around $2.19 for an output. That is incredible in terms of the price-to-performance ratio. Look at Gemini 2.5 Pro and all of these other models. The difference here is absolutely outstanding.

One thing that people don’t realize is that developers and consumers don’t really have loyalty to one specific platform. Developers, people using APIs where the backend is just an LLM performing some complex tasks, are really going to ensure that they use the cheapest model because they can save a ton. And if that’s the case for a lot of these companies, then maybe DeepSeek might just be eating into their market share.

deepseek R1 compare claude — Deepseek R1 compared to Claude

Independent Evaluation Results

There’s something rather interesting when we look at independent benchmarks. The SEAL leaderboards are a set of expert-driven third-party rankings for LLMs developed by Skills AI Safety Evaluations and Alignment Lab. These benchmarks are designed to provide a transparent, unbiased and tamperproof assessment of the capabilities of Frontier LLMs across several domains.

The main difference is that there are private curated data sets. Unlike many public benchmarks, Seal actually uses proprietary data sets that are kept private to prevent models from being trained or fine-tuned on the evaluation data. This approach ensures that the results are not gamed or contaminated by prior exposure.

In the multi-challenge benchmark, which tests how good AI models are at real conversations with humans, Deepseek is currently at number 12. This test examines instruction retention, user memory, editing properly, and self-coherence. When you look at really specific benchmarks, you can really gauge where the model excels and where it falters. This shows that Deep Seek isn’t as good at instruction following or instruction retention as other models.

Model Variants and Future Developments

With Deepseek, they didn’t just release R1. They actually distilled those capabilities down into a Quen 3 8 billion parameter base model, and it’s absolutely incredible. That model achieves state-of-the-art performance among open-source models on the AME 2024, surpassing the original Quen 38B by 10% and matching the performance of Quen 3 235B thinking.

They managed to take those incredible capabilities from Deepseek R1, the new edition, and put that into a smaller model that is basically now state-of-the-art for 8 billion parameters. Intelligence being compact and being on your phone is going to be absolutely incredible within the next 10 years. Just imagine having an agent like 03 running locally offline with your private information at the tip of your fingertips.

Government Restrictions and Concerns

There is a really big issue coming because whilst Deepseek is a great model, we might not actually get future versions of Deepseek. Governments around the world are actually looking to block access to Deepseek. The US government is seeking to ban Deepseek mainly due to national security, data privacy and foreign influence concerns because Deepseek is a Chinese AI company.

US officials fear that Deep Seek could be used for espionage with sensitive government, corporate, or personal data potentially being accessed by Chinese authorities under China’s national security laws, which require companies to share data with the government upon request. Deepseek stores user data on servers located in China, raising the risk of unauthorized access and surveillance by Chinese intelligence agencies.

Recently, the US Commerce Department bureaux informed staffers that Deepseek is going to be banned on their government devices. Numerous states have banned the model from government devices, including Virginia, Texas, New York, and a coalition of 21 state attorneys have urged Congress to pass legislation.

The R2 Development Challenge

Where on earth is R2? R2 could be facing some huge delays. R2 is Deep Seek’s next installation of the model, and the model was meant to be delivered or released in early May. But now with recent restrictions and new laws being passed, it looks like we might not actually get Deep Seek R2 for quite some time.

Deepseek has built its entire R2 development strategy around using Huawei’s Ascend 19B AI chips. This decision wasn’t made by choice, but rather by necessity, as Chinese companies have been largely cut off from accessing the most advanced AI chips made by companies like Nvidia due to export restrictions.

In May 2025, the United States Department of Commerce declared that using Huawei Ascender chips anywhere in the world violates US export controls. This was a significant escalation in the ongoing technology war between the United States and China. The US government’s position is that Huawei Ascent ships were likely designed with certain US software or technology or produced with semiconductor manufacturing equipment that is a direct product of certain US origin software technology.

The ruling’s implications are severe. Anyone anywhere in the world who uses these Huawei chips could face criminal penalties under US law. This creates extraordinary legal risk because if they use these chips and produce that model, they could be facing several ramifications.

Beyond the legal issues, there are significant technical challenges. Chinese AI firms using ascent chips have complained about hardware performance problems, particularly stability issues that are critical for AI model development. If Deep Seek is forced to abandon these Huawei chips due to legal concerns, they would face the enormous challenge of retraining their R2 model on different hardware.

Training a model with 1.2 trillion parameters requires months of continuous computation on thousands of processors and involves feeding the AI system 5.2 petabytes of training data. Starting this entire process over on different hardware would not be simple. Different AI chips have different architectural programming interfaces and optimization requirements, and the software conversion process alone could take months of engineering work.

Overall, Deep Seek is facing the perfect storm of technical, legal, and strategic challenges that illustrate the complex realities of AI development in an increasingly fragmented global technology landscape. Their reliance on Huawei chips, initially seen as a clever workaround for US export restrictions, are now becoming a potential liability.

Source: https://www.youtube.com/watch?v=ByeX368H3GQ