DeepSeek claims that its inference model outperforms OpenAI’s o1 on certain benchmarks

Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called inference model, which it claims performs similarly to the OpenAI o1 on some AI benchmarks.

R1 is available from the Hugging Face AI development platform under an MIT license, meaning it can be used commercially without restrictions. According to DeepSeek, the R1 outperforms the o1 in AIME, MATH-500, and SWE-bench Verified benchmarks. AIME uses other models to evaluate model performance, while MATH-500 is a set of word problems. Meanwhile, SWE-bench Verified focuses on programming tasks.

Being a logic model, the R1 effectively verifies facts, which helps it avoid some of the pitfalls that typically trip up models. Heuristic models take a little longer — typically seconds to minutes — to arrive at solutions than a typical non-heuristic model. The upside is that they tend to be more reliable in areas such as physics, science, and mathematics.

R1 contains 671 billion parameters, DeepSeek revealed in a Technical report. The parameters roughly correspond to the model’s problem-solving skills, and models with more parameters generally perform better than those with fewer.

671 billion parameters is a huge number, but DeepSeek has also released “distilled” versions of R1 ranging in size from 1.5 billion parameters to 70 billion parameters. Smaller can run on a laptop. As for the full R1, it requires more powerful hardware, but it is He is Available through DeepSeek’s API at prices 90% to 95% cheaper than OpenAI’s o1.

There is a downside to R1. Being a Chinese model, it is subject to… Performance measurement By China’s Internet Regulatory Commission to ensure that its responses “embody core socialist values.” R1 will not answer questions about Tiananmen Square, for example, or Taiwan’s autonomy.

R1 liquidation is in progress. Image credits:Deep Sick

a lot Chinese AI systems, including other inference models, refuse to respond to topics that might irritate the country’s regulators, such as speculation about… Xi Jinping order.

The R1 arrives days after the outgoing Biden administration proposed it Harder Export rules and restrictions on artificial intelligence technologies for Chinese projects. Companies in China have already been banned from purchasing advanced AI chips, but if the new rules take effect as written, companies will face tougher restrictions on both the semiconductor technology and models needed to power advanced AI systems.

In a policy document last week, OpenAI urged the US government to support the development of American artificial intelligence, fearing that Chinese models could match or surpass them in capabilities. in interview With The Information, OpenAI VP of Policy Chris Lehane pointed to High Flyer Capital Management, DeepSeek’s parent company, as an organization of particular interest.

So far, at least three Chinese laboratories have been established – DeepSeek, Alibaba, and He lovesowned by Chinese unicorn Moonshot AI – models that claim to be a competitor to o1. (It’s worth noting that DeepSeek was the first, announcing an R1 preview in late November.) mail On X, Dean Paul, an AI researcher at George Mason University, said the trend suggests Chinese AI labs will remain a “fast follower.”

“The impressive performance of DeepSeek’s distilled models […] “It means that highly efficient thinkers will continue to be widely deployed and will be operable on local hardware,” Paul wrote, “out of sight of any top-down control system.”

Leave a Comment Cancel reply