Reasoning and the “Impossible Language”
The Fragility of Large Language Models
Recently, an article titled GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models (http://arxiv.org/pdf/2410.05229), authored by Apple researchers, came out challenging the dominant paradigm about LLMs. “We found no evidence of formal reasoning in language models …. Their behaviour is better explained by sophisticated pattern matching — so fragile, in fact, that changing names can alter results by ~10%!” Wow! LLMs can’t reason. What a shocker.
The abstract is that recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly mathematics-related. They tested the mathematical reasoning of models on grade-school-level questions. While it reported LLM’s performance “significantly improved in recent years,” it failed at some simple math tasks, so it remains unclear whether LLM’s mathematical reasoning capabilities have genuinely advanced relative to natural human communications. Raising questions about its authenticity and reliability. “Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question.”
In plain speak, this means that the LLMs couldn’t adapt characteristics of natural language properties relative to practical, real-world communications. So, “the fragility of mathematical reasoning in these models demonstrate that their performance significantly deteriorates as the number of clauses in a question increases.”
The authors conclude that this decline is because “current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data.” No kidding!
This study was useful because it provided a realistic and more nuanced understanding of LLMs’ limitations regarding formal reasoning capabilities. In other words, LLMs are not human-like intelligence, nor can it authentically reason like humans.
A lot of time and mathematical jargon could’ve been saved, however, if only we had used philosophy to help explain the mathematics of LLMs. Applying some scientific wisdom or just reading 18th-century mathematician and philosopher Immanuel Kant‘s’ philosophy on reasoning. His Critique of Pure Reason emphasized a distinction between what we can experience (the natural, observable world) and what we cannot. And how we obtain substantive knowledge of the world through two capacities: sensibility and understanding. Empirical judgments depend on both sensory experience and concept formation.
Further, knowledge and understanding are attained through reason, accordingly, if a machine can’t reason, then how can it provide reasonable, nuanced answers relative to its communication coming from the physical world? Nuance is extraordinarily important to effective communication.
We must be cautious in accepting hyped-up but unsubstantiated statements about LLM/Generative AI. Particularly without applying our applied intelligence (ai.) The ai decisioning process framework protects us from falling victim to ignorance and becoming useful suckers for the wealth ambitions of Big Tech and its billionaires.
Common sense and critical thinking are superpowers, but you have to utilize it meaningfully.
Performance is “ok” on small problems but quickly tails off, according to scientist Subbarao Kambhapati and his team in a recent analysis of GPT o1. They found that LLMs struggle with even the smallest instances. “The two LRMs we tested, o1-preview and o1-mini, are surprisingly effective, but this performance is still not robust and degrades quickly with length.”
The Apple-authored article also gives examples of basic integer arithmetic — falling off with increasingly larger multiplication problems that were repeatedly observed in both older and newer models. In comparison to using a simple calculator, which would be 100%.
The whole underlying concept of neural networks and how it can replicate human-like reasoning is simply not true. The theory of “deep learning” is just that, a nice theory that doesn’t unfold well in the real world. And what we are seeing is an evolving advanced technology automation, a machine language. Or, as former MIT scientist and the father of modern linguistics, Noam Chomsky, calls it, an “Impossible Language,” only understandable and applicable to the machine itself.
If the very foundation of LLMs is inherently unreliable, can we turn around and trust them to build reliable systems? To be the be-all and end-all of all knowledge, as their hubris promoters are claiming. To ‘solve all physics’ as top promoter Sam Altman is currently claiming.
We must learn to recognize the con when Altman then turns and asks for a few trillion dollars to do the job, climate be damned! Because only he and his tech bro, aristocracy Overlords, can solve all of the world’s problems.
Chomsky says that AI is an efficient tool that is “really good for performing certain tasks,” but using ChatGPT as an example, he explains why GenAI systems are generally deficient. Calling ChatGPT the “impossible language” because it violates all the rules of linguistics; however, those same violations, he says, are the only way GPT can perform.
Intelligence and awareness of the environment are what create natural language communications; effective communication occurs through precise and abstract communications. Unlike narrow and rigid AI, humans are versatile, flexible, dynamic and amazing at the real-time interpretation of communication. Both verbal and non-verbal. As Aristotle explained over 2000 years ago, and both Kant and Chomsky have articulated, we humans can just “pick up on things” we can capture abstract structures in the complexity of language…making it seem like a breeze.
Cognitive science tells us that the mind can create abstract structures to process, sort, and reason quickly, to understand and communicate efficiently, effectively, and in real-time. LLMs do not have any of these capabilities and can’t be programmed in. So, there cannot be any intelligent leap forward to comparing authentic human intelligence (AHI) to the impossible language.
The ability for AHI to thrive with abstract structures of human communication is uniquely and biologically human. The human mind and body work as one, there can be no doubt about that. Sensory inputs trigger comprehension and turn it into knowledge functionality, both a priori and empirical — something a machine can’t do. Machines are merely programmed language processors, Chomsky says.
Like Kant, Chomsky also points out that because of past experiences relevant to time and space, true intelligence is a function of experiences that can be utilized rapidly, with detail nuances. The mind is a unique and unmatched multidimensional processor and “experience machine,” says neuroscientist Anil Seth, who also says that we are intelligent “because of our bodies.”
Natural human language has belief systems and culture embedded, and so often, it’s about how you say things rather than what you say that communicates the message best. For example, you often hear someone from another culture and language try to tell a joke or a funny story by trying to translate it into “English.” But the joke/story often falls flat, and the teller then says that it sounds better in my language or that it is better to tell it with a cultural accent.
Both language and accent add flavour, humour, a priori context of understanding and personal experience — the distinctive mode of pronunciation in a language is highly associated with a particular people/nation, locality, or social class. There is an emphasis given to syllables, pitch, tone, and, of course, the role of the five senses. There are underlying undertones to a story, including the major ones: religion and faith, cultural norms and rituals. In every word and gesture, there is an enormous amount of nuance, culture, and experiences packed into just a few lines. Making it highly unlikely that someone who is not from that same culture would be able to fully appreciate it. So, experience has a symbiotic relationship with reasoning.
Chomsky exposes the linguistic shortcomings and myths around artificial intelligence, adding that it’s intellectually dishonest and dangerous to try and equate natural human language and AHI with the impossible machine language. ChatGPT, he says, is nothing more than a “sophisticated imitation game that violates all the rules of linguistics, culture, grammar, experiences, and symbolism.”
But again, this is the only way that ChatGPT can work — by creating its own rules within its own designed system, using “brute force” and a massive hype machine to try and convince our lying eyes that its limitations are not its weakness.
Knowing the limitations of something, we can better assess and understand its optimal function and use from both a utility and value-creating perspective. LLMs are an automated technology type, and that’s fine, Chomsky says, but we shouldn’t be disingenuous and dodgy about its abilities.