Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>1) You can't learn an accurate world model just from text. >2) Multimodal learning (vision, language, etc) and interaction with the environment is crucial for true learning.

LLMs can be trained with multimodal data. Language is only tokens and pixel and sound data can be encoded into tokens. All data can be serialized. You can train this thing on data we can't even comprehend.

Here's the big question. It's clear we need less data then an LLM. But I think it's because evolution has pretrained our brains for this so we have brains geared towards specific things. Like we are geared towards walking, talking, reading, in the same way a cheetah is geared towards ground speed more then it is at flight.

If we placed a human and an LLM in completely unfamiliar spaces and tried to train both with data. Which will perform better?

And I mean completely non familiar spaces. Like let's make it non Euclidean space and only using sonar for visualization. Something totally foreign to reality as humans know it.

I honestly think the LLM will beat us in this environment. We might've succeeded already in creating AGI it's just the G is too much. It's too general so it's learning everything from scratch and it can't catch up to us.

Maybe what we need is to figure out how to bias the AI to think and be biased in the way humans are biased.



Humans are more adaptable than you think:

- echolocation in blind humans https://en.wikipedia.org/wiki/Human_echolocation

- sight through signals sent on tongue https://www.scientificamerican.com/article/device-lets-blind...

In the latter case, I recall reading the people involved ended up perceiving these signals as a "first order" sense (not consciously treated information, but on an intuitive level like hearing or vision).


Hugely different data too?

If you think of all the neurons connected up to vision, touch, hearing, heat receptors, balance, etc. there’s a constant stream of multimodal data of different types along with constant reinforcement learning - e.g. ‘if you move your eye in this way, the scene you see changes’, ‘if you tilt your body this way your balance changes’, etc. and this runs from even before you are born, throughout your life.


> non Euclidean space and only using sonar for visualization

Pretty good idea for a video game!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: