How generative AI is making robots smarter, more capable, and more ready for the mainstream

As generative AI becomes a greater part of robotics, we can expect innovations to happen at a faster pace, moving robots closer to deployment …

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More


In recent months, the field of robotics has witnessed remarkable advancements, largely propelled by the rapid progression in generative artificial intelligence. 

Leading tech companies and research labs are using generative AI models to address some of the big challenges in robotics that have so far prevented them from being widely deployed outside of heavy industry and in research labs.

Here are just a few of the innovative ways generative AI is helping bring robotics research further along.

Training robotic machine learning models in real-world scenarios presents a host of challenges. The process is slow, unfolding at the pace of real-time events. It’s also costly, constrained by the number of robots that can be physically deployed. Furthermore, safety concerns and limited access to diverse environments for comprehensive training pose additional hurdles.

Event

AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.

Learn More

To circumvent these obstacles, researchers use simulated environments for training robotic models. This approach allows for scalability and significantly reduces costs compared to real-world training. However, this solution isn’t without its drawbacks. 

Creating detailed simulated environments can be costly. Moreover, these environments often lack the intricate details found in the real world, leading to a disparity known as the “sim-to-real gap.” This gap results in a performance drop when models trained in simulation are deployed in the real world, as they can’t handle the complexities and nuances of their environments.

Recently, generative models have become important tools for bridging the sim-to-real gap and helping make simulated environments more realistic and detailed.

For instance, neural radiance fields (NeRF) models are generative models that can create 3D objects from 2D scenes. NeRFs make it much easier for developers to create simulated environments for training robots.

Nvidia is leveraging generative models such as NeRFs for its Neural Reconstruction Engine. This AI system creates realistic 3D environments from videos recorded by cameras installed on cars, which can be used to train models for self-driving vehicles.

SyncDreamer, a model developed by researchers from various universities, generates multiple views of an object from a single 2D image. These views can then be fed to another generative model to create a 3D model for simulated environments.

And DeepMind’s UniSim model uses LLMs and diffusion models to generate photo-realistic video sequences. These sequences can be used to create fine-grained simulations for training robotic models.

Bridging the robots-to-humans gap

Another significant hurdle in robotics research is enhancing human-robot interaction. This involves improving the ability of robots to understand human commands and collaborate effectively. 

Advances in multi-modal generative models are helping address this problem. These models integrate natural language with other data types, such as images and videos, to facilitate more effective communication with robots.

A prime example of this is Google’s embodied language model, PaLM-E. This model combines language models and vision transformers, which are jointly trained to understand correlations between images and text. 

The model then applies this knowledge to analyze visual scenes and translate natural language instructions into robot actions. Models like PaLM-E have significantly improved the ability of robots to execute complex commands.

Building on this concept, last summer, Google introduced RT-2, a vision-language-action model. Trained on a vast corpus of web data, RT-2 can carry out natural language instructions, even for tasks it hasn’t been explicitly trained on. 

Bridging the gap between robots and datasets

The world of robotics research is rich with models and datasets gathered from real-world robots. However, these datasets are often disparate, collected from various robots, in different formats, and for diverse tasks. 

Recently, some research groups have shifted their focus to consolidating the knowledge embedded in these datasets to create more versatile models. 

A standout example is RT-X, a collaborative project between DeepMind and 33 other research institutions. The project’s ambitious goal is to develop a general-purpose AI system capable of working with different types of physical robots and performing a wide array of tasks.

The project was inspired by the work on large language models, which show that training LLMs on very large datasets can enable them to perform tasks that were previously beyond their reach. The researchers brought together datasets from 22 robot embodiments and 20 institutions in various countries. This consolidated dataset encompassed 500 skills and 150,000 tasks. The researchers then trained a series of models on this unified dataset. Remarkably, the resulting models demonstrated the ability to generalize to many embodiments and tasks, including some they weren’t explicitly trained for.

Creating better reward models

Generative models have found a significant application in code writing, and interestingly, they can also generate code for training robots. Nvidia’s latest model, Eureka, uses generative AI to design reward models, a notoriously challenging component of the reinforcement learning systems used in robot training.

Eureka uses GPT-4 to write code for reward models, eliminating the need for task-specific prompting or predefined reward templates. It leverages simulation environments and GPUs to swiftly evaluate the quality of large batches of reward candidates, thereby streamlining the training process. Eureka also uses GPT-4 to analyze and improve the code it generates. Moreover, it can incorporate human feedback to refine the reward model and align it more closely with the developer’s objectives.

Generative models, which began with simple goals, such as generating images or text, are now being used in increasingly complex tasks beyond their original vision. As generative AI becomes a greater part of robotics, we can expect innovations to happen at a faster pace, moving robots closer to deployment alongside us in our everyday lives.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.