I love this point "giving 2D ARC tasks to an LLM is like expecting humans to perform reasoning in 4D". I wonder (not really) what human performance on ARC would be if they didn't see the puzzle as a 2d picture, but as a sequence of numbers or a sequence of 1d pictures.
I also could easily see humans failing on hypothetical 3d ARC tasks if their representation is not convenient enough.
One thing I'd be curious about is how changing the way the ARC problem is presented affects the solution curves. I'd guess that if one found a better format for the data (potentially as simple as allowing for column major and row major presentations), that the accuracy curves would follow the same shape as you've described here.
I don't think it's particularly important, but it would be interesting (and maybe say something about scaffolding?)
I love this point "giving 2D ARC tasks to an LLM is like expecting humans to perform reasoning in 4D". I wonder (not really) what human performance on ARC would be if they didn't see the puzzle as a 2d picture, but as a sequence of numbers or a sequence of 1d pictures.
I also could easily see humans failing on hypothetical 3d ARC tasks if their representation is not convenient enough.
One thing I'd be curious about is how changing the way the ARC problem is presented affects the solution curves. I'd guess that if one found a better format for the data (potentially as simple as allowing for column major and row major presentations), that the accuracy curves would follow the same shape as you've described here.
I don't think it's particularly important, but it would be interesting (and maybe say something about scaffolding?)
Not even wrong.