We are living through a collective delusion. Frontier Labs have poured hundreds of billions of dollars into scaling transformers on internet text, and have convinced us that because the outputs look impressive, something profound is happening underneath. It is not. Large language models, in their current paradigm, are a statistical dead end — and the sooner we reckon with this, the sooner we can build systems that actually think.
Let us be precise about what LLMs are: they are extraordinarily sophisticated compression engines. They interpolate over a fixed corpus of human-generated text, learning to predict what a human would say next. This is not intelligence. The knowledge ceiling of any LLM is permanently capped at the boundary of human knowledge at the time of its training. You cannot compress your way to a discovery. You cannot predict your way to a proof that no human has ever written. The model can recombine, rephrase, and retrieve — but it cannot discover. And discovery is the entire point.
The counterargument people reach for is reasoning. Surely chain-of-thought, o-series models, extended thinking — surely these break the ceiling? They do not. These techniques improve the model's ability to navigate its latent space more carefully. But that latent space was built from human traces. The model is still, fundamentally, asking: what would a human write here? It has no independent relationship with truth.
The proof that a radically different approach works already exists. AlphaGo did not become superhuman by studying human games. It became superhuman by playing against itself, by discovering strategies so alien that grandmasters needed months to understand what had happened. The knowledge it generated was genuinely novel — it was not in the training data because no human had ever thought it. Reinforcement learning, grounded in first principles and self-play, broke the ceiling entirely.
We need to apply this same logic to mathematics — and we believe mathematics is the right domain precisely because of its unique properties. Mathematics is not a soft domain where you can fake competence with fluent prose. Every step is either valid or it is not. It builds on itself with ruthless rigor: you cannot bluff your way through a proof the way you can bluff your way through a paragraph. This makes it a perfect proving ground. A system that can genuinely do mathematics — not retrieve proofs, but construct them — must have developed something real: internal world models, compositional reasoning, the capacity to operate in territory no human has charted.
If we can build an RL-native system that learns mathematics from first principles — starting from axioms, failing, self-correcting, and discovering — we will have built something that can, in principle, learn anything that has internal logical structure. Physics. Chemistry. Code. Strategy. These are all mathematics wearing different clothes.
The scaling laws will plateau. The benchmark games will run out. The retrieval will hit its ceiling. The future is not a bigger token predictor.
The future is a system that earns its knowledge the hard way — by discovering it.
Just like Humanity did.