Folks in the field of AI like to make predictions for AGI. I have thoughts, and Iโve always wanted to write them down. Letโs do that.
Since this isnโt something Iโve touched on in the past, Iโll start by doing my best to define what I mean by โgeneral intelligenceโ: a generally intelligent entity is one that achieves a special synthesis of three things:
A way of interacting with and observing a complex environment. Typically this means embodiment: the ability to perceive and interact with the natural world.
A robust world model covering the environment. This is the mechanism which allows an entity to perform quick inference with a reasonable accuracy. World models in humans are generally referred to as โintuitionโ, โfast thinkingโ or โsystem 1 thinkingโ.
A mechanism for performing deep introspection on arbitrary topics. This is thought of in many different ways โ it is โreasoningโ, โslow thinkingโ or โsystem 2 thinkingโ.
If you have these three things, you can build a generally intelligent agent. Hereโs how:
First, you seed your agent with one or more objectives. Have the agent use system 2 thinking in conjunction with its world model to start ideating ways to optimize for its objectives. It picks the best idea and builds a plan. It uses this plan to take an action on the world. It observes the result of this action and compares that result with the expectation it had based on its world model. It might update its world model here with the new knowledge gained. It uses system 2 thinking to make alterations to the plan (or idea). Rinse and repeat.
My definition for general intelligence is an agent that can coherently execute the above cycle repeatedly over long periods of time, thereby being able to attempt to optimize any objective.
The capacity to actually achieve arbitrary objectives is not a requirement. Some objectives are simply too hard. Adaptability and coherence are the key: can the agent use what it knows to synthesize a plan, and is it able to continuously act towards a single objective over long time periods.
So with that out of the way โ where do I think we are on the path to building a general intelligence?
World Models
Weโre already building world models with autoregressive transformers, particularly of the โomnimodelโ variety. How robust they are is up for debate. Thereโs good news, though: in my experience, scale improves robustness and humanity is currently pouring capital into scaling autoregressive models. So we can expect robustness to improve.
With that said, I suspect the world models we have right now are sufficient to build a generally intelligent agent.
Side note: I also suspect that robustness can be further improved via the interaction of system 2 thinking and observing the real world. This is a paradigm we havenโt really seen in AI yet, but happens all the time in living things. Itโs a very important mechanism for improving robustness.
When LLM skeptics like Yann say we havenโt yet achieved the intelligence of a cat โ this is the point that they are missing. Yes, LLMs still lack some basic knowledge that every cat has, but they could learn that knowledge โ given the ability to self-improve in this way. And such self-improvement is doable with transformers and the right ingredients.
Reasoning
There is not a well known way to achieve system 2 thinking, but I am quite confident that it is possible within the transformer paradigm with the technology and compute we have available to us right now. I estimate that we are 2-3 years away from building a mechanism for system 2 thinking which is sufficiently good for the cycle I described above.
Embodiment
Embodiment is something weโre still figuring out with AI but which is something I am once again quite optimistic about near-term advancements. There is a convergence currently happening between the field of robotics and LLMs that is hard to ignore.
Robots are becoming extremely capable โ able to respond to very abstract commands like โmove forwardโ, โget upโ, โkick ballโ, โreach for objectโ, etc. For example, see what Figure is up to or the recently released Unitree H1.
On the opposite end of the spectrum, large Omnimodels give us a way to map arbitrary sensory inputs into commands which can be sent to these sophisticated robotics systems.
Iโve been spending a lot of time lately walking around outside talking to GPT-4o while letting it observe the world through my smartphone camera. I like asking it questions to test its knowledge of the physical world. Itโs far from perfect, but it is surprisingly capable. Weโre close to being able to deploy systems which can commit coherent strings of actions on the environment and observe (and understand) the results. I suspect weโre going to see some really impressive progress in the next 1-2 years here.
This is the field of AI I am personally most excited in, and I plan to spend most of my time working on this over the coming years.
TL;DR
In summary โ weโve basically solved building world models, have 2-3 years on system 2 thinking, and 1-2 years on embodiment. The latter two can be done concurrently. Once all of the ingredients have been built, we need to integrate them together and build the cycling algorithm I described above. Iโd give that another 1-2 years.
So my current estimate is 3-5 years for AGI. Iโm leaning towards 3 for something that looks an awful lot like a generally intelligent, embodied agent (which I would personally call an AGI). Then a few more years to refine it to the point that we can convince the Gary Marcusโ of the world.
Really excited to see how this ages. ๐
Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.
From Subbarao Kambhampati via linkedIn:
"๐๐ง ๐ญ๐ก๐ ๐ฌ๐๐๐๐ฒ ๐จ๐ฉ๐ญ๐ข๐๐ฌ ๐จ๐ "๐ฉ๐๐๐๐ ๐๐๐ ๐๐ ๐จ๐ฎ๐ฐ ๐ด๐๐๐ ๐๐ ๐ช๐๐๐๐๐๐๐๐๐ ๐ฉ๐๐๐๐๐๐๐๐ ๐ช๐๐๐๐๐๐๐" hashtag#SundayHarangue. One of the big reasons for the increased volume of "๐๐๐ ๐๐จ๐ฆ๐จ๐ซ๐ซ๐จ๐ฐ" hype has been o3's performance on the "frontier math" benchmark--something that other models basically had no handle on.
We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1--and that the benchmark creators were not allowed to disclose this *until after o3 *.
That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of "๐1/๐3 ๐๐๐๐ ๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐ ๐๐๐๐, ๐๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐"--that the AGI tomorrow crowd seem to have--that ๐๐ฑ๐ฆ๐ฏ๐๐ ๐ธ๐ฉ๐ช๐ญ๐ฆ ๐ฏ๐ฐ๐ต ๐ฆ๐น๐ฑ๐ญ๐ช๐ค๐ช๐ต๐ญ๐บ ๐ค๐ญ๐ข๐ช๐ฎ๐ช๐ฏ๐จ, ๐ค๐ฆ๐ณ๐ต๐ข๐ช๐ฏ๐ญ๐บ ๐ฅ๐ช๐ฅ๐ฏ'๐ต ๐ฅ๐ช๐ณ๐ฆ๐ค๐ต๐ญ๐บ ๐ค๐ฐ๐ฏ๐ต๐ณ๐ข๐ฅ๐ช๐ค๐ต--is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don't completely believe that OpenAI didn't have access to the Olympiad/Frontier Math data before hand.. )
I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )
๐ซ๐๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐ ๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐ ๐๐ ๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐--๐๐๐ ๐ ๐๐๐๐'๐ ๐๐๐๐๐ ๐๐๐๐๐๐ "๐จ๐ฎ๐ฐ ๐ป๐๐๐๐๐๐๐."
We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than "๐ธ๐ฆ ๐ฅ๐ช๐ฅ๐ฏ'๐ต ๐ด๐ฆ๐ฆ ๐ต๐ฉ๐ข๐ต ๐ด๐ฑ๐ฆ๐ค๐ช๐ง๐ช๐ค ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ ๐ช๐ฏ๐ด๐ต๐ข๐ฏ๐ค๐ฆ ๐ฅ๐ถ๐ณ๐ช๐ฏ๐จ ๐ต๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ" (see "In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities" https://lnkd.in/gZ2wBM_F ).
At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."
Big stupid snake oil strikes again.