Is Xai lied about Grok 3 criteria?

Discussions on artificial intelligence standards – and how to report them by AI Labs – publishes public opinion.

This week, Openai employee accused Elon Musk’s Ai, Xai, to publish the scriptable results of the latest AI, Grok 3. One of the founders participating in Xai, Igor Babushkin, Insist The company was on the right.

The truth is somewhere between them.

in Post on the Xai BlogThe company published a graphic fee showing Grok 3 performance in AIME 2025, a set of difficult mathematics questions from the newly invited mathematics exam. Some experts have Doubted the health of AIME as the standard of Amnesty International. However, the AIME 2025 or older versions of the test are used commonly to investigate mathematics in the form.

The Xai chart showed two types of GROK 3, Grok 3 Beta Beta and Grok 3 Mini Teeding, overcoming the best available model in Openai, O3-MINI-HIGH, in AIME 2025 did not include the O3-MINI-Hight Aime 2025 in “” Cons@64 ”.

What are the negatives@64, you may ask? Well, it is short for “consensus@64”, and mainly gives model 64 trying to answer each problem in a standard and take the answers that have been created repeatedly as final answers. You can also imagine, CONS@64 tends to increase the standard degrees of models slightly, and delete them from the graph may make them look as if one of the models exceeds another model when this is in reality.

GROK 3 Reasying Beta and GROK 3 Mini Reasoning Scores for Aime 2025 in “@1”-which means that the first result that models got in the standard-quoted from high O3-MINI degree. Grok 3 Beansing Beta also follows the OpenAi -set of “medium” computing. After xi is Grok 3 ad. As “the smartest artificial intelligence in the world”.

Babushkin Get on x Openai has published the similar standard plans in the past – although plans compare the performance of their own models. Putting a more neutral party in the discussion is a more “accuracy” graphic fee that shows almost each model’s performance in CONS@64:

Farhan how some people see a conspiracy as an attack on Openai and others as an attack on Grok while in reality it is Deepseek’s propaganda
(I actually think Grok looks good there, and TTC Chicainry is worth the OPENAI behind O3-MINI-*Alia*”1″ “” more scrutiny.) https://t.co/djqljpcjh8 pic.twitter.com/3wh8foufic

– Teortaxes ▶ (Deepseek Twitter🐋airon Powder 2023 – ∞) (Tetortaxestex) February 20, 2025

But as an artificial intelligence researcher, Nathan Lambert ReferPerhaps the most important level is still a mystery: the calculation (and critical) cost that it took for each model to achieve the best degree. This only shows that most of the artificial intelligence standards have not known the restrictions of models – and its power points.

Leave a Comment Cancel reply