#

Attending

The Embodied AI workshop will be held in-person at CVPR 2025 in Nashville, Tennessee on June 12th from 9 to 5 CDT:

In-Person: Workshop talks and panels will be held in room 101 D from 9-noon and 1:30-5 CDT.
Remote:Zoom info for remote CVPR attendees can be found on our official CVPR workshop page. Questions can be asked via Slack at
Ask questions on Slack
Posters: Posters will be in ExHall D from 12:00 PM to 1:30 PM CDT at boards #140 to #169. Oral presentations will be in room 101 D from 3:30-4:00 PM CDT.
Printing: Information on poster printing is available on CVPR's website.

For late-breaking updates from CVPR, see the workshop's CVPR page.

#

Overview

Minds live in bodies, and bodies move through a changing world. The goal of embodied artificial intelligence is to create agents, such as robots, which learn to creatively solve challenging tasks requiring interaction with the environment. While this is a tall order, fantastic advances in deep learning and the increasing availability of large datasets like ImageNet have enabled superhuman performance on a variety of AI tasks previously thought intractable. Computer vision, speech recognition and natural language processing have experienced transformative revolutions at passive input-output tasks like language translation and image processing, and reinforcement learning has similarly achieved world-class performance at interactive tasks like games. These advances have supercharged embodied AI, enabling a growing collection of researchers to make rapid progress towards intelligent agents which can:

See: perceive their environment through vision or other senses.
Talk: hold a natural language dialog grounded in their environment.
Listen: understand and react to audio input anywhere in a scene.
Act: navigate and interact with their environment to accomplish goals.
Reason: consider and plan for the long-term consequences of their actions.

The goal of the Embodied AI workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. EAI 2025’s overaching theme is Real-World Applications: creating embodied AI solutions that are deployed in real-world environments, ideally in the service of real-world tasks. Embodied AI agents are maturing, and the community should promote work that transfers this research out of simulation and laboratory environments into real-world settings. This umbrella theme is divided into four topics:

Embodied AI Solutions As embodied AI solutions become more powerful, we should demand of them that they solve more complex problems - particularly real-world problems outside of simulation and the laboratory. While scientific advances are of interest, we are actively seeking work that applies embodied AI to real-world industry applications.
Advances in Simulation Advances in simulation have enabled many embodied AI algorithms. Procedural simulation, parameterized simulation, differentiable simulation and world models are of interest, as are simulations based on the increasing numbers of large embodied datasets.
Generative Methods for Embodied AI Generative AI is becoming an increasingly important for embodied artificial intelligence research. Topics such as generative AI for simulation, generative AI for data generation, and generative AI for policies (e.g., diffusion policies and world models) are of great interest.
Foundation Models Large-scale pretrained models adaptable to new tasks first came to the forefront in the domains of language, speech, and vision, but increasingly foundation models are being developed in robotics domains including action, perception, problem solving, and simulation. We invite both language model planning research that adapts existing models to embodied problems as well as embodied foundation models that are trained directly on embodied problems.

The Embodied AI 2025 workshop will be held in conjunction with CVPR 2025 in Nashville, Tennessee. It will feature a host of invited talks covering a variety of topics in Embodied AI, many exciting Embodied AI challenges, a poster session, and panel discussions. For more information on the Embodied AI Workshop series, see our Retrospectives paper on the first three years of the workshop. For the latest updates, follow the Embodied AI Medium blog at medium.com/embodied-artificial-intelligence.

#

Timeline

Workshop Announced

March 31st, 2025

Paper Submission Deadline

CLOSED - Friday May 23rd, 2025

Paper Notification Deadline

CLOSED - Monday June 4nd, 2025

Challenge Submission Deadlines

May-June 2025. Check each challenge for the specific date.

Camera Ready Copy Deadline

Tuesday June 11th, 2025

Sixth Annual Embodied AI Workshop at CVPR

Nashville, Tennessee
June 12, 2025

Challenge Winners Announced

At the workshop. Check each challenge for specifics.

#

Workshop Schedule

Embodied AI will be a hybrid workshop, with both in-person talks and streaming via zoom.

Workshop Talks: 9:00AM-5:00PM CDT - Room 101D
Poster Session: 12:00PM-1:30PM CDT - ExHall D boards #140 to #169

Zoom information can be found for CVPR attendees on our official CVPR workshop page.
Remote and in-person attendees are welcome to ask questions via Slack:

Ask questions on Slack

Workshop Introduction: Embodied AI
9:00 - 9:10 AM CDT
Location: Room 101D
Anthony Francis
Logical Robotics
Challenge Presentations - Winning Methods
(ARNOLD, HAZARD, ManiSkill-ViTac, SMM)
9:10 - 10:00 AM CDT
Location: Room 101D
Moderator - David Hall
CSIRO
- 9:10: ARNOLD Challenge
- 9:20: HAZARD Challenge
- 9:30: ManiSkill-ViTac
- 9:40: SMM Challenge
Challenge Q&A
10:00 - 10:30 AM CDT
Location: Room 101D
Invited Talk - Embodied AI Applications
Title: Learning from Humans with Vision and Touch
10:30 - 11:00 AM CDT
Location: Room 101D
Lerrel Pinto
NYU
Bio: Lerrel Pinto is an Assistant Professor of Computer Science at NYU Courant and part of the CILVR group. Lerrel runs the General-purpose Robotics and AI Lab (GRAIL) with the goal of getting robots to generalize and adapt in the messy world we live in.
Abstract: Despite rapid advances in robotics, robots still struggle to achieve the dexterity and adaptability of humans in real-world manipulation tasks. This talk explores how learning directly from humans—leveraging both vision and touch—can bridg... [Expand]
Invited Talk - Foundation Models for Embodied AI
Towards Multimodal Embodied AI Agents that Can See, Talk and Act
11:00 - 11:30 AM CDT
Location: Room 101D
Jianwei Yang
Microsoft Research
Bio: Jianwei Yang is a principal researcher in Deep Learning Group at Microsoft Research, Redmond, led by Jianfeng Gao. My research interests generally span in computer vision, multi-modality, and machine learning. Currently, I am focusing on building next-generation vision and multi-modal foundations.
The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory o... [Expand]
Invited Talk - Simulation for Embodied AI
Title: Geometry and Physics Bias in Embodied AI
11:30 AM - 12:00 PM CDT
Location: Room 101D
Jiayun (Peter) Wang
Caltech
Bio: Jiayun (Peter) Wang is a postdoctoral researcher at the California Institute of Technology, working with Prof. Anima Anandkumar. He received his PhD from UC Berkeley in 2023, advised by Prof. Stella Yu. His research develops novel machine learning and computer vision methodologies that address challenges of data scarcity and computational cost, with real-world applications like healthcare. More information can be found at his website: https://pwang.pw/.
Abstract: Embodied AI demands agents that see the world with geometric fidelity, anticipate and interact with it with physical rigor. The talk will present a three-stage ladder—Perceive, Predict, Control—showing how carefully chosen geometry and phys... [Expand]
Lunch / Accepted Papers Poster Session
12:00 PM - 1:30 PM CDT
Location: ExHall D
- EAI's posters will be at boards #140 to #169.
Invited Talk - Robotics and Embodied AI
Title: The Ingredients for Efficient Robot Learning and Exploration
1:30 - 2:00 PM CDT
Location: Room 101D
Rika Antonova
University of Cambridge
Bio: Rika Antonova is an Associate Professor at the University of Cambridge. Her research interests include data-efficient reinforcement learning algorithms, robotics, active learning & exploration. Earlier, Rika was a postdoctoral scholar at Stanford University upon receiving the Computing Innovation Fellowship from the US National Science Foundation. Rikacompleted her PhD at KTH, and earlier she obtained a research Master's degree from the Robotics Institute at Carnegie Mellon University. Before that, Rika was a senior software engineer at Google.
Abstract: In this talk, I will outline the ingredients for enabling efficient robot learning. First, I will demonstrate how large vision-language models can enhance scene understanding and generalization, allowing robots to learn general rules from s... [Expand]
Invited Talk - Foundation Models for Embodied AI
Title: Large Behavior Models for Dexterous Manipulation
2:00 - 2:30 PM CDT
Location: Room 101D
Rareș Ambruș
TRI
Bio: Dr. Rareș Ambruș is a senior manager in the Large Behavior Models division at Toyota Research Institute (TRI). His research interests lie at the intersection of robotics, computer vision and machine learning with the aim of discovering visual representations for embodied applications in areas such as automated driving and robotics. Dr. Ambruș received his Ph.D. in 2017 from the Royal Institute of Technology (KTH), Sweden, focusing on self-supervised perception and mapping for mobile robots. He has more than 100 publications and patents at top AI venues covering fundamental topics in computer vision, machine learning and robotics.
Abstract: Dexterous manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development... [Expand]
Invited Talk - Generative AI for Embodied AI
Title: World Models at Scale for Embodied Driving
2:30 - 3:00 PM CDT
Location: Room 101D
Nikhil Mohan
Wayve
Bio: Nikhil Mohan is a Lead Scientist at Wayve, where he focuses on leveraging data-driven techniques for simulation in autonomous driving. His work spans Neural Radiance Fields (NeRFs), Gaussian Splatting, and generative models, emphasizing their application to improve and evaluate Wayve’s AI Driver performance. Before turning his attention to simulation, Nikhil led Wayve’s production driving team, where they shipped research prototypes into the production system. Prior to joining Wayve, he earned his Master’s degree at Carnegie Mellon University, concentrating in machine learning and signal processing.
Abstract: Nikhil's talk will focus on using World Models to produce data at scale for Embodied AI in the context of self driving.
Invited Talk - Generative AI for Embodied AI
Title: Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
3:00 - 3:30 PM CDT
Huan Ling
NVIDIA
Bio: Huan Ling is a Senior Research Scientist at NVIDIA’s Spatial Intelligence (TorontoAI) Lab. His research focuses on developing foundational generative models that enable realistic and controllable environments—spanning video synthesis, 3D/4D scene generation and reconstruction. His work aims for building scalable systems that support real world applications. Huan’s research has been featured at top conferences such as CVPR and NeurIPS, and he actively collaborates across disciplines to advance the frontier of generative AI for real-world applications. He has contributed to the development and large-scale training of video foundation model products, including NVIDIA-COSMOS and COSMOS-Drive-Dreams, which enable high-fidelity, controllable video generation for physicalAI related scenarios like autonomous driving.
Abstract: Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in trainin... [Expand]
Accepted Paper Highlights
3:30 - 4:00 PM CDT
- #2: On the use of Pre-trained Visual Representations in Visuo-Motor Robot Learning
- #6: R-EQA: Retrieval-Augmented Generation for Embodied Question Answering
- #7: Uncertainty Modeling in Autonomous Vehicle Trajectory Prediction: A Comprehensive Survey
- #15: Benchmarking Arbitrary Natural Language Tasks in 3D Open Worlds
- #19: What matters in ImageNav: architecture, pre-training, sim settings, pose
- #23: MotIF: Motion Instruction Fine-tuning
Moderator - David Hall
CSIRO
Invited Speaker Panel
4:00 - 4:30 PM CDT
Moderator - Anthony Francis
Logical Robotics
Debate on the Future of Embodied AI
4:30 - 5:00 PM CDT
Moderator - Anthony Francis
Logical Robotics
Workshop Concludes
5:00 PM CDT

#

Sponsor Events

NVIDIA: Check the NVIDIA event page for the full list of events sponsored by NVIDIA at CVPR. Also, remember to checkout the NVIDIA party!

#

Challenges

The Embodied AI 2025 workshop is hosting many exciting challenges covering a wide range of topics. More details regarding data, submission instructions, and timelines can be found on the individual challenge websites.

The workshop organizers will award each first-prize challenge winner a cash prize, sponsored by Logical Robotics and our other sponsors.

Challenge winners may be given the opportunity to present during their challenge's presentation at the the workshop. Since many challenges can be grouped into similar tasks, we encourage participants to submit models to more than 1 challenge. The table below describes, compares, and links each challenge.

Challenge	Task	2024 Winner	Simulation Platform	Scene Dataset	Observations	Action Space	Interactive Actions?	Stochastic Acuation?


ARNOLD	Language-Grounded Manipulation	Isaac Sim	Arnold Dataset	RGB-D, Proprioception	Continuous	✓	✓
HAZARD	Multi-Object Rescue	ThreeDWorld	HAZARD dataset	RGB-D, Temperature Sensors, Water Level	Discrete	✓
ManiSkill-ViTac	Vision-Tactile Fusion Manipulation	SAPIEN	Customized Scenarios	RGB-D, Proproioception, Tactile Signals	Continuous	✓
Social Mobile Manipulation	Social Mobile Manipulation	Infinite World (based on Isaac Sim)	SMM Dataset	RGB-D	Continuous	✓

#

Call for Papers

We invite high-quality 2-page extended abstracts on embodied AI, especially in areas relevant to the themes of this year's workshop:

Embodied AI Solutions
Advances in Simulation
Generative Methods for Embodied AI
Foundation Models

as well as themes related to embodied AI in general:

Visual Navigation
Embodied Mobile Manipulation
Embodied Question Answering
Embodied Vision & Language
Language Model Planning

Accepted papers will be presented as posters or spotlight talks at the workshop. These papers will be made publicly available in a non-archival format, allowing future submission to archival journals or conferences. Paper submissions do not have to be anononymized. Per CVPR rules regarding workshop papers, at least one author must register for CVPR using an in-person registration.

Submission

The submission deadline CLOSED on May 23rd ( Anywhere on Earth - for clarity, 2025/05/24 00:01 in GMT as computed by OpenReview). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format.

Paper submissions are CLOSED as of May 23rd, 2025.
Notifications were sent on June 4th, 2025.
Camera-ready copies of accepted papers are due by June 11th, 2025.

Accepted Papers

Note. The order of the papers is randomized each time the page is refreshed.

What matters in ImageNav: architecture, pre-training, sim settings, pose

Gianluca Monaci, Philippe Weinzaepfel, Christian Wolf

State-of-the-art image goal navigation methods either rely on dedicated image-matching or pre-training of vision modules on relative pose estimation or image reconstruction. [Expand]

#