Discussion about this post

User's avatar
Leo Hike's avatar

I love this point "giving 2D ARC tasks to an LLM is like expecting humans to perform reasoning in 4D". I wonder (not really) what human performance on ARC would be if they didn't see the puzzle as a 2d picture, but as a sequence of numbers or a sequence of 1d pictures.

I also could easily see humans failing on hypothetical 3d ARC tasks if their representation is not convenient enough.

Expand full comment
Neel Gupta's avatar

There's a simpler explanation; larger problems require a larger program search space. Adaptive computation is an unsolved problem and LLMs are no exception - thus the model is limited by how well it can narrow down candidate solutions in CoT and effectively act on them.

o3 style models are "significantly better" at this by expending exponential amount of compute per problem, because they aren't "solving" it per se but rather trying to search and guess a set of heuristics to correctly solve the task at hand.

So no, perception is not the bottleneck here. Its the lack of an ACT-styled mechanism. ARC is just the simplest way to encode such family of problems, which just happens to be spatial in nature.

Expand full comment
2 more comments...

No posts