Beyond Text: Why Language Models Will Never Be 'Truly Intelligent'

The Quote That Bothered Me

“The path to superintelligence — just train up the LLMs, train on more synthetic data, hire thousands of people to school your system in post-training — I think is complete bullshit. It’s just never going to work.”

When I first read that from Yann LeCun, my reaction was defensive. I use LLMs every day. I write about them. I build entire workflows around them. Calling them a “dead end” felt almost like a personal affront.

But Yann LeCun isn’t just anyone. He won the 2018 Turing Award — computing’s equivalent of the Nobel Prize. He led AI at Meta for 12 years. He’s one of the three “godfathers” of modern AI. And in November 2025, he left Meta to found AMI Labs (Advanced Machine Intelligence Labs), betting his entire reputation on a radically different vision for AI’s future.

In March 2026, AMI Labs raised $1.03 billion at a $3.5 billion valuation — the largest seed round in European history. No product. No revenue. Just a thesis.

After weeks of researching his argument, I changed my mind. Not completely — but enough to worry about what I’m building.

LeCun’s Argument (Without Oversimplification)

LeCun’s thesis isn’t that LLMs are useless — he says they’re useful. His thesis is that they’ll never achieve human-level intelligence, because they’re structurally incapable of understanding the real world. They are, in his words, “an off-ramp, a distraction, a dead end” on the path to machine intelligence.

Why?

The text problem. LLMs are trained on text. But most human knowledge isn’t language. A two-year-old understands gravity, object permanence, cause-and-effect relationships — all without reading a single word. A four-year-old has already processed 50 times more sensory data than the world’s largest LLM.

As LeCun told MIT Technology Review: LLMs are limited to the discrete world of text. They can’t truly reason or plan because they lack a model of the world. They can’t predict the consequences of their actions.

Moravec’s Paradox. What’s easy for us — perception, navigation, physical manipulation — is hard for computers, and vice versa. LLMs are incredible at linguistic fluency, but that fluency deceives us. It makes us think there’s real understanding behind it when what exists is sophisticated pattern recognition.

This explains why, despite billions invested, we still don’t have a domestic robot as agile as a house cat or truly autonomous cars (Level 5). AI speaks well but doesn’t understand the world.

The Problem of “Outsourced Learning”

This is the part that made me think the most. Building an LLM today requires an army of humans: data scientists, engineers, curators, policy specialists, annotators — all working to feed the AI with the right information.

Compare that with how a baby learns. It learns by watching, touching, breaking things, and falling. It processes what it learned while sleeping. Learning is an intrinsic capability, not something imposed from outside by an engineering team.

Researchers have identified three abilities that animals possess but current AI systems don’t:

Active learning — the ability to choose one’s own data to learn from. A baby directs its attention. An LLM receives whatever it’s given.

Meta-control — switching between different learning modes depending on the situation. Observing when it’s time to observe, acting when it’s time to act.

Meta-cognition — sensing one’s own performance. Knowing when it’s making mistakes without needing human feedback. A functional self-awareness that LLMs simply don’t possess.

World Models: The $1 Billion Bet

The alternative LeCun proposes is called world models. Instead of training AI on text, train it on sensory data (primarily video) so it understands how the physical world works.

The idea, built on his I-JEPA research at Meta, is that AI learns abstract representations of reality — not by generating pixels, but by predicting in an abstract representation space. Just as a child develops intuitive physics by watching objects fall, without anyone explaining Newton’s laws.

The proposed architecture has three systems working together. An observation system (learns by watching the world), an action system (learns by doing and interacting), and a meta-control system (the “master” that decides when to observe, when to act, and when to reflect — all automatically).

And LeCun isn’t alone. Fei-Fei Li (former Stanford HAI director) raised $1 billion for World Labs with its Marble product (3D environment generation). Google DeepMind released Genie 3, the first real-time interactive world model. NVIDIA saw 2 million downloads of its Cosmos platform. And Runway positions its Gen-4.5 as a “world model that understands physics.”

My Honest Opinion (With Caveats)

After researching all of this, here’s where I landed:

LeCun is probably right long-term. The LLM architecture alone won’t produce general intelligence. Linguistic fluency isn’t comprehension, and adding more text data doesn’t solve the fundamental problem that AI lacks a model of the physical world.

But “long-term” is very long. LeCun himself admits it will take “several years to a decade.” Ilya Sutskever (ex-OpenAI) talks about “5 to 20 years.” Meanwhile, LLMs remain the foundation for applications serving hundreds of millions of people. GPT, Claude, Gemini will keep iterating and improving.

The most likely future is hybrid. Most serious researchers don’t think in terms of “replacement” but integration. A system that uses LLMs for language understanding and abstract reasoning, while the world model handles physical planning and consequence simulation. It’s not “either/or” — it’s “and/and.”

What concerns me is the saturation LeCun points to. Meta’s Llama 4, launched in April 2025, performed in real-world scenarios far below its benchmarks — evidence that optimizing evaluation metrics isn’t the same as improving understanding. If we’re hitting the LLM ceiling, the race for world models becomes urgent, not academic.

What This Means for Us

If you, like me, work with AI daily, the practical implication isn’t “stop using LLMs.” They remain incredible productivity tools. But it’s worth being aware that:

What we have today are powerful tools for text prediction — not world comprehension. The “intelligence” we see is fluency, not understanding. And the next real leap in AI probably won’t come from GPT-6 or Claude 5, but from a fundamentally different architecture.

The future of AI isn’t in reading more books. It’s in living in the world.

And honestly, that excites me. Because if the next frontier requires understanding of the physical world, causality, and experience — everything that is irreducibly human — then perhaps the AI that would replace humans is further away than we thought. And the AI that amplifies humans is closer than ever.

Share if this expanded your perspective:

Email: fodra@fodra.com.br
LinkedIn: linkedin.com/in/mauriciofodra

The AI that speaks well already exists. The AI that understands the world is still being invented. And that difference is everything.