You’ll see foundation models for Humanoids continually using a System 2 + System 1 style architecture which is actually inspired by human cognition.



Most vision-language-action (VLA) models today are built as centralized multimodal systems that handle perception, language, and action within a single network.

Codec’s infrastructure is perfect for this as it treats each Operator as a sandboxed module. Meaning you can spin up multiple Operators in parallel, each running its own model or task, while keeping them encapsulated and coordinated through the same architecture.

Robots and Humanoids in general typically have multiple brains, where one Operator might handle vision processing, another handling balance, another doing high level planning etc, which can all be coordinated through Codec’s system.

Nvidia’s foundation model Issac GR00T N1 uses the two module System 2 + System 1 architecture. System 2 is a vision-language model (a version of PaLM or similar, multimodal) that observes the world through the robot’s cameras and listens to instructions, then makes a high level plan.

System 1 is a diffusion transformer policy that takes that plan and turns it into continuous motions in real time. You can think of System 2 as the deliberative brain and System 1 as the instinctual body controller. System 2 might output something like “move to the red cup, grasp it, then place it on the shelf,” and System 1 will generate the detailed joint trajectories for the legs and arms to execute each step smoothly.

System 1 was trained on tons of trajectory data (including human teleoperated demos and physics simulated data) to master fine motions, while System 2 was built on a transformer with internet pretraining (for semantic understanding).

This separation of reasoning vs. acting is very powerful for NVIDIA. It means GR00T can handle long horizon tasks that require planning (thanks to System 2) and also react instantly to perturbations (thanks to System 1).

If a robot is carrying a tray and someone nudges the tray, System 1 can correct the balance immediately rather than waiting for the slower System 2 to notice.

GR00T N1 was one of the first openly available robotics foundation models, and it quickly gained traction.

Out of the box, it demonstrated skill across many tasks in simulation, it could grasp and move objects with one hand or two, hand items between its hands, and perform multi step chores without any task specific programming. Because it wasn’t tied to a single embodiment, developers showed it working on different robots with minimal adjustments.

This is also true for Helix (Figure’s foundation model) which uses this type of architecture. Helix allows for two robots or multiple skills to operate, Codec could enable a multi agent brain by running several Operators that share information.

This “isolated pod” design means each component can be specialized (just like System 1 vs System 2) and even developed by different teams, yet they can work together.

It’s a one of a kind approach in the sense that Codec is building the deep software stack to support this modular, distributed intelligence, whereas most others only focus on the AI model itself.

Codec also leverages large pre trained models. If you’re building a robot application on it, you might plug in an OpenVLA or a Pi Zero foundation model as part of your Operator. Codec provides the connectors, easy access to camera feeds or robot APIs, so you don’t have to write the low level code to get images from a robot’s camera or to send velocity commands to its motors. It’s all abstracted behind a high level SDK.

One of the reasons I’m so bullish on Codec is exactly what I outlined above. They’re not chasing narratives, the architecture is built to be the glue between foundation models, and it frictionlessly supports multi brain systems, which is critical for humanoid complexity.

Because we’re so early in this trend, it’s worth studying the designs of industry leaders and understanding why they work. Robotics is hard to grasp given the layers across hardware and software, but once you learn to break each section down piece by piece, it becomes far easier to digest.

It might feel like a waste of time now, but this is the same method that gave me a head start during AI szn and why I was early on so many projects. Become disciplined and learn which components can co exist and which components don’t scale.

It’ll pay dividends over the coming months.

Deca Trillions ( $CODEC ) coded.
LL-0.77%
VSN1.08%
IN-11.45%
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)