Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is no particular reason why AI has to stick to language models though. Indeed if you want human like thinking you pretty much have to go beyond language as we do other stuff too if you see what I mean. A recent example: "Google DeepMind unveils its first “thinking” robotics AI" https://arstechnica.com/google/2025/09/google-deepmind-unvei...


> There is no particular reason why AI has to stick to language models though.

There’s no reason at all. But that’s not the technology that’s in the consumer space, growing exponentially, gaining all the current hype.

So at this point in time, it’s just a theoretical future that will happen inevitably but we don’t know when. It could be next year. It could be 10 years. It could be 100 years or more.

My prediction is that current AI tech plateaus long before any AGI-capable technology emerges.


Yeah, quite possible.


That's a rather poor choice for an example considering Gemini Robotics-ER is built on a tuned version of Gemini, which is itself an LLM. And while the action model is impressive, the actual "reasoning" here is still being handled by an LLM.

From the paper [0]:

> Gemini Robotics 1.5 model family. Both Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 inherit Gemini’s multimodal world knowledge.

> Agentic System Architecture. The full agentic system consists of an orchestrator and an action model that are implemented by the VLM and the VLA, respectively:

> • Orchestrator: The orchestrator processes user input and environmental feedback and controls the overall task flow. It breaks complex tasks into simpler steps that can be executed by the VLA, and it performs success detection to decide when to switch to the next step. To accomplish a user-specified task, it can leverage digital tools to access external information or perform additional reasoning steps. We use GR-ER 1.5 as the orchestrator.

> • Action model: The action model translates instructions issued by the orchestrator into low-level robot actions. It is made available to the orchestrator as a specialized tool and receives instructions via open-vocabulary natural language. The action model is implemented by the GR 1.5 model.

AI researchers have been trying to discover workable architectures for decades, and LLMs are the best we've got so far. There is no reason to believe that this exponential growth on test scores would or even could transfer to other architectures. In fact, the core advantage that LLMs have here is that they can be trained on vast, vast amounts of text scraped from the internet and taken from pirated books. Other model architectures that don't involve next-token-prediction cannot be trained using that same bottomless data source, and trying to learn quickly from real-world experiences is still a problem we haven't solved.

[0] https://storage.googleapis.com/deepmind-media/gemini-robotic...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: