{"id":1033197,"date":"2026-01-15T08:25:14","date_gmt":"2026-01-15T08:25:14","guid":{"rendered":"http:\/\/uYKpCiN2GZXgZWxqa79iei"},"modified":"2026-01-15T08:25:14","modified_gmt":"2026-01-15T08:25:14","slug":"a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear","status":"publish","type":"post","link":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/","title":{"rendered":"A new math benchmark just dropped and leading AI models can solve &#8216;less than 2%&#8217; of its problems&#8230; oh dear"},"content":{"rendered":"<article>\n<p>Sometimes I forget there&#8217;s a whole other world out there where AI models aren&#8217;t just used for basic tasks such as simple research and quick content summaries. Out in the land of bigwigs, they&#8217;re instead being used to help with everything from financial analysis to scientific research. That&#8217;s why their mathematical capabilities are so important\u2014plus it&#8217;s a general marker of reasoning capabilities.<\/p>\n<p>Which is why mathematical benchmarks exist. Benchmarks such as <a data-analytics-id=\"inline-link\" href=\"https:\/\/epoch.ai\/frontiermath\/the-benchmark\" target=\"_blank\">FrontierMath<\/a>, which its maker, Epoch AI, has just dropped and which is putting LLMs through their paces with &#8220;hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems&#8221; (via <a data-analytics-id=\"inline-link\" href=\"https:\/\/arstechnica.com\/ai\/2024\/11\/new-secret-math-benchmark-stumps-ai-models-and-phds-alike\/\" target=\"_blank\">Ars Technica<\/a>).<\/p>\n<p>While today&#8217;s AI models don&#8217;t tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according to Epoch AI, &#8220;they solve less than 2% of FrontierMath problems, revealing a substantial gap between current AI capabilities and the collective prowess of the mathematics community&#8221;.<\/p>\n<p>To be clear, these are <em>hard <\/em>problems. As in, so hard that they &#8220;typically require hours or days for expert mathematicians to solve&#8221;, ranging &#8220;from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory&#8221;.<\/p>\n<p>What&#8217;s so different about this benchmark is that solving these mathematical problems requires &#8220;extended chains of precise reasoning, with each step building exactly on what came before&#8221;.<\/p>\n<p>AI models have traditionally not been great at extended reasoning in general, let alone for super-advanced math. This makes sense when you consider what AI models, at bottom, are doing. Using LLMs as an example, these are trained on tons of data to figure out what each next word would most likely be based on this data. Although of course there&#8217;s plenty of room for directing the model more towards different words, the process is essentially probabilistic.<\/p>\n<p>Of late, however, we&#8217;ve seen AI models apply their probabilistic &#8220;thinking&#8221; in more of a directed fashion towards intermediary steps of this &#8220;thinking&#8221;. In other words, we&#8217;ve seen a move towards AI models that attempt to <em>reason through <\/em>their thinking, rather than just jumping to a probabilistic conclusion.<\/p>\n<p>There&#8217;s now a version of ChatGPT-4o, for instance, that uses reasoning (and you better make sure you <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.pcgamer.com\/software\/ai\/please-halt-this-activity-not-so-open-openai-seems-to-have-gone-full-mob-boss-sending-threatening-emails-to-anyone-who-asks-its-latest-ai-models-probing-questions\/\" target=\"_blank\">don&#8217;t question it<\/a>). It&#8217;s also telling that you can now potentially be awarded for giving a question that AI can&#8217;t answer for &#8220;<a data-analytics-id=\"inline-link\" href=\"https:\/\/www.safe.ai\/blog\/humanitys-last-exam\" target=\"_blank\">humanity&#8217;s last exam<\/a>&#8220;.<\/p>\n<p>Of course, these individual steps of reasoning might themselves be arrived at probabilistically\u2014and could we expect any more from a non-sentient algorithm?\u2014but they do seem to be engaging in what we flesh-and-bloodies after the fact consider to be &#8220;reasoning&#8221;.<\/p>\n<p>We&#8217;re clearly a way off from having these AI models achieve the reasoning capabilities of our best and brightest, though. We can see that now that we have a mathematical benchmark capable of really putting them to the test\u20142% isn&#8217;t great, is it? (And take that, robots.)<\/p>\n<div class=\"fancy-box\">\n<div class=\"fancy_box-title\">AI, explained<\/div>\n<div class=\"fancy_box_body\">\n<figure class=\"van-image-figure \" >\n<div class='image-full-width-wrapper'>\n<div class='image-widthsetter' >\n<p class=\"vanilla-image-block\" style=\"padding-top:56.25%;\"><img decoding=\"async\" id=\"eQ4QvnT5n24R9f4nQNq5MP\" name=\"GettyImages-1245391728.jpg\" caption=\"\" alt=\"OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.\" src=\"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg\" mos=\"\" link=\"\" align=\"\" fullscreen=\"\" width=\"\" height=\"\" attribution=\"\" endorsement=\"\" class=\"pinterest-pin-exclude\"><\/p>\n<\/div>\n<\/div><figcaption itemprop=\"caption description\" class=\"\"><span class=\"credit\" itemprop=\"copyrightHolder\">(Image credit: Jakub Porzycki\/NurPhoto via Getty Images)<\/span><\/figcaption><\/figure>\n<p class=\"fancy-box__body-text\"><a data-analytics-id=\"inline-link\" href=\"https:\/\/www.pcgamer.com\/software\/ai\/general-intelligence-explained\/\" target=\"_blank\"><strong>What is artificial general intelligence?<\/strong><\/a><strong>:<\/strong> We dive into the lingo of AI and what the terms actually mean.<\/p>\n<\/div>\n<\/div>\n<p>Regarding the FrontierMath problems, Fields Medalist Terence Tao tells Epoch AI, &#8220;I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages\u2026&#8221;<\/p>\n<p>While AI models might not be able to crack these difficult problems just yet, the FrontierMath benchmark looks to serve as a good litmus test for future improvements, ensuring the models aren&#8217;t just spewing out mathematical nonsense that only experts could verify as such.<\/p>\n<p>We must, in the end, remember that AI is not truth-aiming, however closely <em>we humans <\/em>aim its probabilistic reasoning at results that tend towards the truth. The philosopher in me must ask: Without it having an inner life aiming towards truth, can truth actually exist for the AI, even if it spews it out? Truth for us, yes, but for the AI? I suspect not, and that&#8217;s why benchmarks like these will be crucial moving forwards into this <a data-analytics-id=\"inline-link\" href=\"https:\/\/blogs.nvidia.com\/blog\/ai-summit-japan-huang-son\/\" target=\"_blank\">new industrial revolution<\/a>, or whatever they&#8217;re calling it these days.<\/p>\n<\/article>\n<p><a href=\"https:\/\/www.pcgamer.com\/software\/ai\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-percent-of-its-problems-oh-dear\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes I forget there&#8217;s a whole other world out there where AI models aren&#8217;t just used for basic tasks such as simple research and quick content summaries. Out in the land of bigwigs, they&#8217;re instead being used to help with everything from financial analysis to scientific research. That&#8217;s why their mathematical capabilities are so important\u2014plus it&#8217;s a general marker of reasoning capabilities. Which is why mathematical benchmarks exist. Benchmarks such as FrontierMath, which its maker, Epoch AI, has just dropped and which is putting LLMs through their paces with &#8220;hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems&#8221; (via Ars Technica). While today&#8217;s AI models don&#8217;t tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according&hellip;<\/p>\n<p class=\"excerpt-more\"><a class=\"blog-excerpt button\" href=\"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/\">Read More&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":1033198,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[336],"tags":[1997,1622],"class_list":["post-1033197","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-pc-gamer","tag-ai","tag-software"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A new math benchmark just dropped and leading AI models can solve &#039;less than 2%&#039; of its problems... oh dear | Arcader News<\/title>\n<meta name=\"description\" content=\"Sometimes I forget there&#039;s a whole other world out there where AI models aren&#039;t just used for basic tasks such as simple research and quick content\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A new math benchmark just dropped and leading AI models can solve &#039;less than 2%&#039; of its problems... oh dear | Arcader News\" \/>\n<meta property=\"og:description\" content=\"Sometimes I forget there&#039;s a whole other world out there where AI models aren&#039;t just used for basic tasks such as simple research and quick content\" \/>\n<meta property=\"og:url\" content=\"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/\" \/>\n<meta property=\"og:site_name\" content=\"Arcade News\" \/>\n<meta property=\"article:published_time\" content=\"2026-01-15T08:25:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"480\" \/>\n\t<meta property=\"og:image:height\" content=\"270\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Arcade News\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Arcade News\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/\"},\"author\":{\"name\":\"Arcade News\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/#\\\/schema\\\/person\\\/8460f5e5076b52fb2369f2f7ce6f2839\"},\"headline\":\"A new math benchmark just dropped and leading AI models can solve &#8216;less than 2%&#8217; of its problems&#8230; oh dear\",\"datePublished\":\"2026-01-15T08:25:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/\"},\"wordCount\":745,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/arcader.org\\\/wp-content\\\/uploads\\\/2024\\\/11\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg\",\"keywords\":[\"ai\",\"software\"],\"articleSection\":[\"PC Gamer\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/\",\"url\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/\",\"name\":\"A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear | Arcader News\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/arcader.org\\\/wp-content\\\/uploads\\\/2024\\\/11\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg\",\"datePublished\":\"2026-01-15T08:25:14+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/#\\\/schema\\\/person\\\/8460f5e5076b52fb2369f2f7ce6f2839\"},\"description\":\"Sometimes I forget there's a whole other world out there where AI models aren't just used for basic tasks such as simple research and quick content\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#primaryimage\",\"url\":\"https:\\\/\\\/arcader.org\\\/wp-content\\\/uploads\\\/2024\\\/11\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg\",\"contentUrl\":\"https:\\\/\\\/arcader.org\\\/wp-content\\\/uploads\\\/2024\\\/11\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg\",\"width\":480,\"height\":270,\"caption\":\"A new math benchmark just dropped and leading AI models can solve \u2018less than 2%\u2019 of its problems..\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/arcader.org\\\/news\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A new math benchmark just dropped and leading AI models can solve &#8216;less than 2%&#8217; of its problems&#8230; oh dear\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/#website\",\"url\":\"https:\\\/\\\/arcader.org\\\/news\\\/\",\"name\":\"Arcade News\",\"description\":\"Free Arcade News from the Best Online Sources\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/arcader.org\\\/news\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/arcader.org\\\/news\\\/#\\\/schema\\\/person\\\/8460f5e5076b52fb2369f2f7ce6f2839\",\"name\":\"Arcade News\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3fea48a614d86edd987bc7bb25f4707c69546d4b1f78ad4aa20b26316bad1f9d?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3fea48a614d86edd987bc7bb25f4707c69546d4b1f78ad4aa20b26316bad1f9d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3fea48a614d86edd987bc7bb25f4707c69546d4b1f78ad4aa20b26316bad1f9d?s=96&d=mm&r=g\",\"caption\":\"Arcade News\"},\"sameAs\":[\"https:\\\/\\\/cricketgames.tv\"],\"url\":\"https:\\\/\\\/arcader.org\\\/news\\\/author\\\/arcade-news\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear | Arcader News","description":"Sometimes I forget there's a whole other world out there where AI models aren't just used for basic tasks such as simple research and quick content","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/","og_locale":"en_US","og_type":"article","og_title":"A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear | Arcader News","og_description":"Sometimes I forget there's a whole other world out there where AI models aren't just used for basic tasks such as simple research and quick content","og_url":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/","og_site_name":"Arcade News","article_published_time":"2026-01-15T08:25:14+00:00","og_image":[{"width":480,"height":270,"url":"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg","type":"image\/jpeg"}],"author":"Arcade News","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Arcade News","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#article","isPartOf":{"@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/"},"author":{"name":"Arcade News","@id":"https:\/\/arcader.org\/news\/#\/schema\/person\/8460f5e5076b52fb2369f2f7ce6f2839"},"headline":"A new math benchmark just dropped and leading AI models can solve &#8216;less than 2%&#8217; of its problems&#8230; oh dear","datePublished":"2026-01-15T08:25:14+00:00","mainEntityOfPage":{"@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/"},"wordCount":745,"commentCount":0,"image":{"@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#primaryimage"},"thumbnailUrl":"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg","keywords":["ai","software"],"articleSection":["PC Gamer"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/","url":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/","name":"A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear | Arcader News","isPartOf":{"@id":"https:\/\/arcader.org\/news\/#website"},"primaryImageOfPage":{"@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#primaryimage"},"image":{"@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#primaryimage"},"thumbnailUrl":"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg","datePublished":"2026-01-15T08:25:14+00:00","author":{"@id":"https:\/\/arcader.org\/news\/#\/schema\/person\/8460f5e5076b52fb2369f2f7ce6f2839"},"description":"Sometimes I forget there's a whole other world out there where AI models aren't just used for basic tasks such as simple research and quick content","breadcrumb":{"@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#primaryimage","url":"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg","contentUrl":"https:\/\/arcader.org\/wp-content\/uploads\/2024\/11\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear.jpg","width":480,"height":270,"caption":"A new math benchmark just dropped and leading AI models can solve \u2018less than 2%\u2019 of its problems.."},{"@type":"BreadcrumbList","@id":"https:\/\/arcader.org\/news\/a-new-math-benchmark-just-dropped-and-leading-ai-models-can-solve-less-than-2-of-its-problems-oh-dear\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/arcader.org\/news\/"},{"@type":"ListItem","position":2,"name":"A new math benchmark just dropped and leading AI models can solve &#8216;less than 2%&#8217; of its problems&#8230; oh dear"}]},{"@type":"WebSite","@id":"https:\/\/arcader.org\/news\/#website","url":"https:\/\/arcader.org\/news\/","name":"Arcade News","description":"Free Arcade News from the Best Online Sources","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/arcader.org\/news\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/arcader.org\/news\/#\/schema\/person\/8460f5e5076b52fb2369f2f7ce6f2839","name":"Arcade News","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/3fea48a614d86edd987bc7bb25f4707c69546d4b1f78ad4aa20b26316bad1f9d?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/3fea48a614d86edd987bc7bb25f4707c69546d4b1f78ad4aa20b26316bad1f9d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3fea48a614d86edd987bc7bb25f4707c69546d4b1f78ad4aa20b26316bad1f9d?s=96&d=mm&r=g","caption":"Arcade News"},"sameAs":["https:\/\/cricketgames.tv"],"url":"https:\/\/arcader.org\/news\/author\/arcade-news\/"}]}},"_links":{"self":[{"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/posts\/1033197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/comments?post=1033197"}],"version-history":[{"count":1,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/posts\/1033197\/revisions"}],"predecessor-version":[{"id":1462819,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/posts\/1033197\/revisions\/1462819"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/media\/1033198"}],"wp:attachment":[{"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/media?parent=1033197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/categories?post=1033197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/arcader.org\/news\/wp-json\/wp\/v2\/tags?post=1033197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}