Wednesday, 11 June 2025

US TECH GLOBAL DOMINANCE ENDS WITH AI FIASCO APPLE STUDY CONFIRMS US AI MODELS CANNOT REASON, NO PROSPECT OF AN AI POWERED ECONOMY IN SIGHT, ALL HYPE AND BLUFF

US AI MUST START AGAIN FROM SCRATCH SO USLESS ARE AMERICAN LLM AND LRMY MODELS SAY EXPERTS

US ECONOMY WILL RECEIVE NO PRODUCTIVITY BOOST FROM AI, CANNOT REPLACE HUMAN BEINGS AS BILL GATES HAS CLAIMED

US MEDIA CALLS THE AI HYPE A GREAT DECEPTION

INVESTORS MAY HAVE FALLED VICTIM TO A HUGE SCAM, HYPE, WHICH DWARFS THE THERANOS SCANDAL

AS USUAL, COVID JAB SCAMMER BILL GATES HAS BEEN AT THE FOREFRONT OF THE AI HYPE, SHOULD BE SUED 

JOINED IN THE HYPE BY MEDIA LIKE ZEROHEDGE WHICH SAYS US MARKETS HAVE ALREADY PRICE IN PRODUCTIVITY GROWTH POWERED BY AI

HOW LONG CAN THE HYPE, BLUFF AND LIES BE KEPT UP AS US INFLATION IS ONLY NOW BEING KEPT DOWN BY RECESSIONARY PRESSURES 

https://www.zerohedge.com/economics/fearmongering-pundits-disappointed-again-consumer-prices-refuse-surge-trump-tariffs


US AI models have turned out to be the biggest tech flop in history.

Far from being a revolutionary advance which can boost th USA s e real world economy and productivity, AI models cannot reason and collapse when faced with hard problems, a new study by Apple has found.

They are so fundamentally flawed in their design that a leading AI researcher Yann LeCun has said AI development has to start from scratch.

From media

"Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds

‘Pretty devastating’ Apple paper raises doubts about race to reach stage of AI at which it matches human intelligence" is one assessement.

Apple researchers have found “fundamental limitations” in cutting-edge artificial intelligence models, in a paper raising doubts about the technology industry’s race to develop ever more powerful systems.

Apple said in a paper published at the weekend that large reasoning models (LRMs) – an advanced form of AI – faced a “complete accuracy collapse” when presented with highly complex problems.

It found that standard AI models outperformed LRMs in low-complexity tasks, while both types of model suffered “complete collapse” with high-complexity tasks. Large reasoning models attempt to solve complex queries by generating detailed thinking processes that break down the problem into smaller steps.

The study, which tested the models’ ability to solve puzzles, added that as LRMs neared performance collapse they began “reducing their reasoning effort”. The Apple researchers said they found this “particularly concerning”.

Gary Marcus, a US academic who has become a prominent voice of caution on the capabilities of AI models, described the Apple paper as “pretty devastating”.

Writing in his newsletter on Substack, Marcus added that the findings raised questions about the race to artificial general intelligence (AGI), a theoretical stage of AI at which a system is able to match a human at carrying out any intellectual task.

Referring to the large language models [LLMs] that underpin tools such as ChatGPT, Marcus wrote: “Anybody who thinks LLMs are a direct route to the sort [of] AGI that could fundamentally transform society for the good is kidding themselves.”

https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse

Hundreds of billions of dollars, decades of work have been invested in a dud, a flop.

Big Tech, Nvidia have completely failed to produce anything like an AI which can help the real world economy.

The hope AI can save the US economy and the status of the USA as a global super power have turned out to be delusional.

Scale AI placed an ad in The Washington Post in which Mr. Wang called on Mr. Trump to increase investment in A.I. or risk falling behind China.

“Dear President Trump. America must win the A.I. war," the ad said.

https://www.nytimes.com/2025/06/10/technology/meta-new-ai-lab-superintelligence.html

Plans to use AI to dominate the world are based on hot air.

The consequences on the US stock market can only be guessed because it has become clear the US stock market is a bubble, kept up by hype.

From media

If AI is the productivity panacea the market has already priced it to be, US GDP over the next decade will be dramatically higher than CBO - and anyone else - projects. Having an ego fight over what may happen in a decade is ridiculous

https://www.zerohedge.com/political/musk-regrets-some-posts-about-trump-signals-possible-reconciliation

But the consequences for the US economy are clear. 

There will be no AI powered robots to replace human beings as Bill Gates claimed.

There will be no miraculous increase in productivity to grow the GDP to pay off its debt.

From media

Over the next decade, advances in artificial intelligence will mean that humans will no longer be needed “for most things” in the world, says Bill Gates.

That’s what the Microsoft co-founder and billionaire philanthropist told comedian Jimmy Fallon during an interview on NBC’s “The Tonight Show” in February. At the moment, expertise remains “rare,” Gates explained, pointing to human specialists we still rely on in many fields, including “a great doctor” or “a great teacher.”

But “with AI, over the next decade, that will become free, commonplace — great medical advice, great tutoring,” Gates said.

In other words, the world is entering a new era of what Gates called “free intelligence” in an interview last month with Harvard University professor and happiness expert Arthur Brooks. The result will be rapid advances in AI-powered technologies that are accessible and touch nearly every aspect of our lives, Gates has said, from improved medicines and diagnoses to widely available AI tutors and virtual assistants.

“It’s very profound and even a little bit scary — because it’s happening very quickly, and there is no upper bound,” Gates told Brooks.

https://www.cnbc.com/2025/03/26/bill-gates-on-ai-humans-wont-be-needed-for-most-things.html#:~:text=The%20result%20will%20be%20rapid,they%20are%20fundamentally%20labor%20replacing.%22

In fact, Bill Gates is a major reason for why LLM are fundamentally flawed, sollipsistic, chaotic and useless as a major investor.

From media

Microsoft co-founder Bill Gates says the development of artificial intelligence (AI) is the most important technological advance in decades.

In a blog post on Tuesday, he called it as fundamental as the creation of the microprocessor, the personal computer, the Internet, and the mobile phone.

"It will change the way people work, learn, travel, get health care, and communicate with each other," he said.

He was writing about the technology used by tools such as chatbot ChatGPT.

Developed by OpenAI, ChatGPT is an AI chatbot which is programmed to answer questions online using natural, human-like language.

...

The team behind it in January 2023 received a multibillion dollar investment from Microsoft - where Mr Gates still serves as an advisor.

...

Mr Gates said he had been meeting with OpenAI - the team behind the artificial intelligence that powers chatbot ChatGPT - since 2016.

Just as the world needs its brightest people focused on its biggest problems, we will need to focus the world's best AIs on its biggest problems."

https://www.bbc.com/news/technology-65032848

In fact, it may take decades for the USA to carch up with China on AI if it ever does having invested all its resources in a "cul de sac", a dead end, preferring hype to substance, marketing to achievement.

From media

Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds

‘Pretty devastating’ Apple paper raises doubts about race to reach stage of AI at which it matches human intelligence


Dan Milmo Global technology editor

Mon 9 Jun 2025 18.12 CEST

Share

Apple researchers have found “fundamental limitations” in cutting-edge artificial intelligence models, in a paper raising doubts about the technology industry’s race to develop ever more powerful systems.

Apple said in a paper published at the weekend that large reasoning models (LRMs) – an advanced form of AI – faced a “complete accuracy collapse” when presented with highly complex problems.

It found that standard AI models outperformed LRMs in low-complexity tasks, while both types of model suffered “complete collapse” with high-complexity tasks. Large reasoning models attempt to solve complex queries by generating detailed thinking processes that break down the problem into smaller steps.

The study, which tested the models’ ability to solve puzzles, added that as LRMs neared performance collapse they began “reducing their reasoning effort”. The Apple researchers said they found this “particularly concerning”.

Gary Marcus, a US academic who has become a prominent voice of caution on the capabilities of AI models, described the Apple paper as “pretty devastating”.

Writing in his newsletter on Substack, Marcus added that the findings raised questions about the race to artificial general intelligence (AGI), a theoretical stage of AI at which a system is able to match a human at carrying out any intellectual task.

Referring to the large language models [LLMs] that underpin tools such as ChatGPT, Marcus wrote: “Anybody who thinks LLMs are a direct route to the sort [of] AGI that could fundamentally transform society for the good is kidding themselves.”

The paper also found that reasoning models wasted computing power by finding the right solution for simpler problems early in their “thinking”. However, as problems became slightly more complex, models first explored incorrect solutions and arrived at the correct ones later.

For higher-complexity problems, however, the models would enter “collapse”, failing to generate any correct solutions. In one case, even when provided with an algorithm that would solve the problem, the models failed.

The paper said: “Upon approaching a critical threshold – which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty.”

The paper set the LRMs puzzle challenges, such as solving the Tower of Hanoi and River Crossing puzzles. The researchers acknowledged that the focus on puzzles represented a limitation in the work.

The paper concluded that the current approach to AI may have reached limitations. It tested models including OpenAI’s o3, Google’s Gemini Thinking, Anthropic’s Claude 3.7 Sonnet-Thinking and DeepSeek-R1. Anthropic, Google and DeepSeek have been contacted for comment. OpenAI, the company behind ChatGPT, declined to comment.

Referring to “generalisable reasoning” – or an AI model’s ability to apply a narrow conclusion more broadly – the paper said: “These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalisable reasoning.”

Andrew Rogoyski, of the Institute for People-Centred AI at the University of Surrey, said the Apple paper signalled the industry was “still feeling its way” on AGI and that the industry could have reached a “cul-de-sac” in its current approach.

“The finding that large reason models lose the plot on complex problems, while performing well on medium- and low-complexity problems implies that we’re in a potential cul-de-sac in current approaches,” he said.

The Apple experts said this indicated a “fundamental scaling limitation in the thinking capabilities of current reasoning models”.

https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse


With just a few days to go until WWDC 2025, Apple published a new AI study that could mark a turning point for the future of AI as we move closer to AGI.

Apple created tests that reveal reasoning AI models available to the public don’t actually reason. These models produce impressive results in math problems and other tasks because they’ve seen those types of tests during training. They’ve memorized the steps to solve problems or complete various tasks users might give to a chatbot.

But Apple’s own tests showed that these AI models can’t adapt to unfamiliar problems and figure out solutions. Worse, the AI tends to give up if it fails to solve a task. Even when Apple provided the algorithms in the prompts, the chatbots still couldn’t pass the tests.

Apple researchers didn’t use math problems to assess whether top AI models can reason. Instead, they turned to puzzles to test various models’ reasoning abilities.

The tests included puzzles like Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. Apple evaluated both regular large language models (LLMs) and large reasoning models (LRMs) using these puzzles, adjusting the difficulty levels.

Apple tested LLMs like ChatGPT GPT-4, Claude 3.7 Sonnet, and DeepSeek V3. For LRMs, it tested ChatGPT o1, ChatGPT o3-mini, Gemini, Claude 3.7 Sonnet Thinking, and DeepSeek R1.

The scientists found that LLMs performed better than reasoning models when the difficulty was easy. LRMs did better at medium difficulty. Once the tasks reached the hard level, all models failed to complete them.

Apple observed that the AI models simply gave up on solving the puzzles at harder levels. Accuracy didn’t just decline gradually, it collapsed outright.

The study suggests that even the best reasoning AI models don’t actually reason when faced with unfamiliar puzzles. The idea of “reasoning” in this context is misleading since these models aren’t truly thinking.

...

Then again, many of us already suspected that reasoning AI models don’t actually think. AGI, or artificial general intelligence, would be the kind of AI that can figure things out on its own when facing new challenges.

https://bgr.com/tech/breakthrough-apple-study-shows-advanced-reasoning-ai-doesnt-actually-reason-at-all/


From media

Intelligence Illusion: What Apple’s AI Study Reveals About Reasoning

ByCornelia C. Walther, Contributor.  AI researcher working with the UN and others to drive social change.

The gleaming veneer of artificial intelligence has captivated the world, with large language models producing eloquent responses that often seem indistinguishable from human thought. Yet beneath this polished surface lies a troubling reality that Apple's latest research has brought into sharp focus: eloquence is not intelligence, and imitation is not understanding.

Apple’s new study, titled "The Illusion of Thinking," has sent shockwaves through the AI community by demonstrating that even the most sophisticated reasoning models fundamentally lack genuine cognitive abilities. This revelation validates what prominent researchers like Meta's Chief AI Scientist Yann LeCun have been arguing for years—that current AI systems are sophisticated pattern-matching machines rather than thinking entities.

The Great AI Deception

The Apple research team's findings are both methodical and damning. By creating controlled puzzle environments that could precisely manipulate complexity while maintaining logical consistency, they revealed three distinct performance regimes in Large Reasoning Models . In low-complexity tasks, standard models actually outperformed their supposedly superior reasoning counterparts. Medium-complexity problems showed marginal benefits from additional "thinking" processes. But most tellingly, both model types experienced complete collapse when faced with high-complexity tasks.

What makes these findings particularly striking is the counter-intuitive scaling behavior the researchers observed. Rather than improving with increased complexity as genuine intelligence would, these models showed a peculiar pattern: their reasoning effort would increase up to a certain point, then decline dramatically despite having adequate computational resources. This suggests that the models weren’t actually reasoning at all— they were following learned patterns that broke down when confronted with novel challenges.

The study exposed fundamental limitations in exact computation, revealing that these systems fail to use explicit algorithms and reason inconsistently across similar puzzles. When the veneer of sophisticated language is stripped away, what remains is a sophisticated but ultimately hollow mimicry of thought.


Apple AI Study Echos Long-Standing Warnings

These findings align perfectly with warnings that Yann LeCun and other leading AI researchers have been voicing for years. LeCun has consistently argued that current LLMs will be largely obsolete within five years, not because they’ll be replaced by better versions of the same technology, but because they represent a fundamentally flawed approach to artificial intelligence.

The core issue isn’t technical prowess — it's conceptual. These systems don't understand; they pattern-match. They don't reason; they interpolate from training data. They don't think; they generate statistically probable responses based on massive datasets. The sophistication of their output masks the absence of genuine comprehension, creating what researchers now recognize as an elaborate illusion of intelligence.

This disconnect between appearance and reality has profound implications for how we evaluate and deploy AI systems. When we mistake fluency for understanding, we risk making critical decisions based on fundamentally flawed reasoning processes. The danger isn't just technological—it's epistemological.

Human Parallels: Our Bias Toward Confident Eloquence

Perhaps most unsettling is how closely this AI limitation mirrors a persistent human cognitive bias. Just as we've been deceived by AI's articulate responses, we consistently overvalue human confidence and extroversion, often mistaking verbal facility for intellectual depth.

The overconfidence bias represents one of the most pervasive flaws in human judgment, where individuals' subjective confidence in their abilities far exceeds their objective accuracy. This bias becomes particularly pronounced in social and professional settings, where confident, extroverted individuals often command disproportionate attention and credibility.

Research consistently shows that we tend to equate confidence with competence, volume with value, and articulateness with intelligence. The extroverted individual who speaks first and most frequently in meetings often shapes group decisions, regardless of the quality of their ideas. The confident presenter who delivers polished but superficial analysis frequently receives more positive evaluation than the thoughtful introvert who offers deeper insights with less theatrical flair.

This psychological tendency creates a dangerous feedback loop. People with low ability often overestimate their competence (the Dunning-Kruger effect), while those with genuine expertise may express appropriate uncertainty about complex issues. The result is a systematic inversion of credibility, where those who know the least speak with the greatest confidence, while those who understand the most communicate with appropriate nuance and qualification.

The Convergence Of Artificial And Human Illusions

The parallel between AI's eloquent emptiness and our bias toward confident communication reveals something profound about the nature of intelligence itself. Both phenomena demonstrate how easily we conflate the appearance of understanding with its substance. Both show how sophisticated communication can mask fundamental limitations in reasoning and comprehension.

Consider the implications for organizational decision-making, educational assessment, and social dynamics. If we consistently overvalue confident presentation over careful analysis—whether from AI systems or human colleagues—we systematically degrade the quality of our collective reasoning. We create environments where performance theater takes precedence over genuine problem-solving.

The Apple study's revelation that AI reasoning models fail when faced with true complexity mirrors how overconfident individuals often struggle with genuinely challenging problems while maintaining their persuasive veneer. Both represent sophisticated forms of intellectual imposture that can persist precisely because they're so convincing on the surface.


Beyond Illusions: Recognizing Genuine Intelligence

Understanding these limitations—both artificial and human—opens the door to more authentic evaluation of intelligence and reasoning. True intelligence isn't characterized by unwavering confidence or eloquent presentation. Instead, it manifests in several key ways:

Genuine intelligence embraces uncertainty when dealing with complex problems. It acknowledges limitations rather than concealing them. It demonstrates consistent reasoning across different contexts rather than breaking down when patterns become unfamiliar. Most importantly, it shows genuine understanding through the ability to adapt principles to novel situations.

In human contexts, this means looking beyond charismatic presentation to evaluate the underlying quality of reasoning. It means creating space for thoughtful, measured responses rather than rewarding only quick, confident answers. It means recognizing that the most profound insights often come wrapped in appropriate humility rather than absolute certainty.

For AI systems, it means developing more rigorous evaluation frameworks that test genuine understanding rather than pattern matching. It means acknowledging current limitations rather than anthropomorphizing sophisticated text generation. It means building systems that can genuinely reason rather than simply appearing to do so.


Moving Forward: Practical Implications

The convergence of Apple's AI findings with psychological research on human biases offers valuable guidance for navigating our increasingly complex world. Whether evaluating AI systems or human colleagues, we must learn to distinguish between performance and competence, between eloquence and understanding.


This requires cultivating intellectual humility – the recognition that genuine intelligence often comes with appropriate uncertainty, that the most confident voices aren't necessarily the most credible, and that true understanding can be distinguished from sophisticated mimicry through careful observation and testing.


SMART Takeaways

Specific: Test AI and human responses with novel, complex problems rather than accepting polished presentations at face value—genuine intelligence adapts consistently across unfamiliar contexts while sophisticated mimicry breaks down.

Measurable: Evaluate reasoning quality by tracking consistency across different scenarios, measuring response accuracy under varying complexity levels, and assessing appropriate uncertainty expression rather than just confidence metrics.

Actionable: In meetings and evaluations, deliberately create space for thoughtful, measured responses; ask follow-up questions that require genuine understanding; and resist the impulse to automatically favor the most confident or articulate speaker.

Relevant: Apply the "pattern-matching vs. reasoning" test to both AI tools and human colleagues—ask whether their impressive responses demonstrate genuine understanding or sophisticated repetition of learned patterns, especially when stakes are high and decisions matter.

To distinguish intelligence from imitation in an AI-infused environment we need to invest in hybrid intelligence, which arises from the complementarity of natural and artificial intelligences – anchored in the strength and limitations of both.

https://www.forbes.com/sites/corneliawalther/2025/06/09/intelligence-illusion-what-apples-ai-study-reveals-about-reasoning/

INDIAN INSTITUTE OF TECHNOLOGY DELHI


 Publish Date: October 28, 2024

AI to Empower, Not Threaten: Meta's Chief AI Scientist Yann LeCun Calls for New Model Architectures at IIT Delhi

Share this on

 During the panel discussion titled ‘From Neural Mimics to Smart Assistants - A Journey into AI's Next Frontiers', Dr. Yann emphasized the need for a rethink of AI architectures, the advantages of open-source, and India's unique potential in the AI landscape.

...

Challenging the current approaches to AI development, Dr. Yann urged for innovative architectures beyond large language models (LLMs). He argued that current AI paradigms are insufficient to achieve true human-like intelligence.

He further said, "We are not going to reach that stage by using the current paradigm and just making it bigger; we need essentially new architectures like objective-driven architecture.” Instead of scaling up models, he argued for systems that understand the physical world and reason through novel situations.

Dr. Yann also focused on spatial AI and projects like JEPA. He described AI's evolution as a sigmoid curve— rapid expansion followed by saturation. Singularity, where machines surpass human intelligence, is not on the immediate horizon. Instead, the future lies in evolving paradigms and replacing outdated approaches with new models that can build a "world model," capable of perceiving, predicting, and planning like animals.

Addressing concerns about AI safety, he dismissed fears of intelligent systems dominating humans. He stated that AI's purpose is to empower. "The future of AI in my opinion is a future in which everyone will be walking around with an assistant digitally like smart glasses." he said, humorously adding, "It'll be like walking around with three smart people working for you." Dr. Yann also highlighted that advanced AI models are already used to ensure ethical outcomes, emphasizing the need for better AI to safeguard against misuse.

https://home.iitd.ac.in/show.php?id=558&in_sections=News

He points out that LLMs like GPT-X or Claude have significant limitations. According to LeCun, these models struggle with basic logic, lack real-world understanding, and cannot retain information long-term. He argues they are incapable of rational thinking or complex planning, and ultimately can't be relied on since they only produce convincing answers when their training data covers the topic.

"If you are a student interested in building the next generation of AI systems, don't work on LLMs," LeCun said a year ago. He believes the field is already dominated by major companies, and that LLMs aren't the path to real intelligence.

Instead, LeCun and his team at Meta are focused on "world models"; AI systems designed to build a genuine understanding of their environment. In a recent study, Meta researchers introduced V-JEPA, an AI model that learns intuitive physical reasoning from videos through self-supervised training. Compared to multimodal LLMs like Gemini or Qwen, V-JEPA demonstrated a much stronger grasp of physics, despite needing far less training data.

https://the-decoder.com/meta-ai-chief-scientist-lecuns-latest-comment-reveals-deep-industry-split-over-the-future-of-ai/

https://officechai.com/ai/anthropic-ceo-dario-amodei-deluded-about-dangers-of-ai-meta-ai-chief-yann-lecun/

https://www.linkedin.com/posts/jamesi_yann-lecun-gave-a-recent-keynote-at-ai-action-activity-7321383699713777665-MjCY


Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds

‘Pretty devastating’ Apple paper raises doubts about race to reach stage of AI at which it matches human intelligence

Dan Milmo Global technology editor

Mon 9 Jun 2025 18.12 CEST

Share

Apple researchers have found “fundamental limitations” in cutting-edge artificial intelligence models, in a paper raising doubts about the technology industry’s race to develop ever more powerful systems.

Apple said in a paper published at the weekend that large reasoning models (LRMs) – an advanced form of AI – faced a “complete accuracy collapse” when presented with highly complex problems.

It found that standard AI models outperformed LRMs in low-complexity tasks, while both types of model suffered “complete collapse” with high-complexity tasks. Large reasoning models attempt to solve complex queries by generating detailed thinking processes that break down the problem into smaller steps.

The study, which tested the models’ ability to solve puzzles, added that as LRMs neared performance collapse they began “reducing their reasoning effort”. The Apple researchers said they found this “particularly concerning”.

Gary Marcus, a US academic who has become a prominent voice of caution on the capabilities of AI models, described the Apple paper as “pretty devastating”.

Writing in his newsletter on Substack, Marcus added that the findings raised questions about the race to artificial general intelligence (AGI), a theoretical stage of AI at which a system is able to match a human at carrying out any intellectual task.

Referring to the large language models [LLMs] that underpin tools such as ChatGPT, Marcus wrote: “Anybody who thinks LLMs are a direct route to the sort [of] AGI that could fundamentally transform society for the good is kidding themselves.”

The paper also found that reasoning models wasted computing power by finding the right solution for simpler problems early in their “thinking”. However, as problems became slightly more complex, models first explored incorrect solutions and arrived at the correct ones later.

For higher-complexity problems, however, the models would enter “collapse”, failing to generate any correct solutions. In one case, even when provided with an algorithm that would solve the problem, the models failed.

The paper said: “Upon approaching a critical threshold – which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty.”

The paper set the LRMs puzzle challenges, such as solving the Tower of Hanoi and River Crossing puzzles. The researchers acknowledged that the focus on puzzles represented a limitation in the work.

The paper concluded that the current approach to AI may have reached limitations. It tested models including OpenAI’s o3, Google’s Gemini Thinking, Anthropic’s Claude 3.7 Sonnet-Thinking and DeepSeek-R1. Anthropic, Google and DeepSeek have been contacted for comment. OpenAI, the company behind ChatGPT, declined to comment.

Referring to “generalisable reasoning” – or an AI model’s ability to apply a narrow conclusion more broadly – the paper said: “These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalisable reasoning.”

Andrew Rogoyski, of the Institute for People-Centred AI at the University of Surrey, said the Apple paper signalled the industry was “still feeling its way” on AGI and that the industry could have reached a “cul-de-sac” in its current approach.

“The finding that large reason models lose the plot on complex problems, while performing well on medium- and low-complexity problems implies that we’re in a potential cul-de-sac in current approaches,” he said.

The Apple experts said this indicated a “fundamental scaling limitation in the thinking capabilities of current reasoning models”.

https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse


With just a few days to go until WWDC 2025, Apple published a new AI study that could mark a turning point for the future of AI as we move closer to AGI.

Apple created tests that reveal reasoning AI models available to the public don’t actually reason. These models produce impressive results in math problems and other tasks because they’ve seen those types of tests during training. They’ve memorized the steps to solve problems or complete various tasks users might give to a chatbot.

But Apple’s own tests showed that these AI models can’t adapt to unfamiliar problems and figure out solutions. Worse, the AI tends to give up if it fails to solve a task. Even when Apple provided the algorithms in the prompts, the chatbots still couldn’t pass the tests.

Apple researchers didn’t use math problems to assess whether top AI models can reason. Instead, they turned to puzzles to test various models’ reasoning abilities.

The tests included puzzles like Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. Apple evaluated both regular large language models (LLMs) and large reasoning models (LRMs) using these puzzles, adjusting the difficulty levels.

Apple tested LLMs like ChatGPT GPT-4, Claude 3.7 Sonnet, and DeepSeek V3. For LRMs, it tested ChatGPT o1, ChatGPT o3-mini, Gemini, Claude 3.7 Sonnet Thinking, and DeepSeek R1.

The scientists found that LLMs performed better than reasoning models when the difficulty was easy. LRMs did better at medium difficulty. Once the tasks reached the hard level, all models failed to complete them.

Apple observed that the AI models simply gave up on solving the puzzles at harder levels. Accuracy didn’t just decline gradually, it collapsed outright.

The study suggests that even the best reasoning AI models don’t actually reason when faced with unfamiliar puzzles. The idea of “reasoning” in this context is misleading since these models aren’t truly thinking.

...

Then again, many of us already suspected that reasoning AI models don’t actually think. AGI, or artificial general intelligence, would be the kind of AI that can figure things out on its own when facing new challenges.

https://bgr.com/tech/breakthrough-apple-study-shows-advanced-reasoning-ai-doesnt-actually-reason-at-all/


A particularly striking example concerns solipsism. Traditionally, solipsism — the view that only one’s own mind is certain to exist — has been considered logically possible but practically incoherent. Philosophers have argued that, while solipsism cannot be disproven, it is unimaginable how a fully solipsistic mind could function meaningfully without external reality to anchor its experiences.


Yet the functioning of LLMs challenges this assumption. Despite operating entirely within self-contained linguistic structures, cut off from direct engagement with the world, LLMs nevertheless create internally coherent and functional patterns of meaning. In this sense, they offer a real-world model of practical solipsism: a system capable of maintaining structured cognition based solely on internal correlations.


However, it is important to note that this solipsism is not absolute. LLMs inherit vast linguistic structures from their training data, which themselves originated in human interaction with the external world. From the standpoint of the LLM’s internal operations, these inherited structures function like Platonic ideas — pre-existing, given elements of meaning that are not actively grounded in current experience. Thus, LLMs represent a diminished or derived form of solipsism, where internal coherence is possible largely because of an initial, externally grounded infusion of knowledge.


In this light, LLMs do not merely present technical or cognitive challenges. They invite a philosophical reconsideration of solipsism itself, suggesting that coherent minds might arise internally, provided they inherit the right scaffolding from an earlier, shared reality.

.

https://medium.com/@fra.canni/understanding-llms-through-philosophy-truth-meaning-and-solipsism-2868c0f844ed



No comments:

Post a Comment