Embodied AI Workshop
CVPR 2025 - Nashville

#

Attending

The Embodied AI workshop will be held in-person at CVPR 2025 in Nashville, Tennessee on June 12th from 9 to 5 CDT:

For late-breaking updates from CVPR, see the workshop's CVPR page.

#

Overview

Minds live in bodies, and bodies move through a changing world. The goal of embodied artificial intelligence is to create agents, such as robots, which learn to creatively solve challenging tasks requiring interaction with the environment. While this is a tall order, fantastic advances in deep learning and the increasing availability of large datasets like ImageNet have enabled superhuman performance on a variety of AI tasks previously thought intractable. Computer vision, speech recognition and natural language processing have experienced transformative revolutions at passive input-output tasks like language translation and image processing, and reinforcement learning has similarly achieved world-class performance at interactive tasks like games. These advances have supercharged embodied AI, enabling a growing collection of researchers to make rapid progress towards intelligent agents which can:

  • See: perceive their environment through vision or other senses.
  • Talk: hold a natural language dialog grounded in their environment.
  • Listen: understand and react to audio input anywhere in a scene.
  • Act: navigate and interact with their environment to accomplish goals.
  • Reason: consider and plan for the long-term consequences of their actions.

The goal of the Embodied AI workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. EAI 2025’s overaching theme is Real-World Applications: creating embodied AI solutions that are deployed in real-world environments, ideally in the service of real-world tasks. Embodied AI agents are maturing, and the community should promote work that transfers this research out of simulation and laboratory environments into real-world settings. This umbrella theme is divided into four topics:

  • Embodied AI Solutions As embodied AI solutions become more powerful, we should demand of them that they solve more complex problems - particularly real-world problems outside of simulation and the laboratory. While scientific advances are of interest, we are actively seeking work that applies embodied AI to real-world industry applications.
  • Advances in Simulation Advances in simulation have enabled many embodied AI algorithms. Procedural simulation, parameterized simulation, differentiable simulation and world models are of interest, as are simulations based on the increasing numbers of large embodied datasets.
  • Generative Methods for Embodied AI Generative AI is becoming an increasingly important for embodied artificial intelligence research. Topics such as generative AI for simulation, generative AI for data generation, and generative AI for policies (e.g., diffusion policies and world models) are of great interest.
  • Foundation Models Large-scale pretrained models adaptable to new tasks first came to the forefront in the domains of language, speech, and vision, but increasingly foundation models are being developed in robotics domains including action, perception, problem solving, and simulation. We invite both language model planning research that adapts existing models to embodied problems as well as embodied foundation models that are trained directly on embodied problems.
The Embodied AI 2025 workshop will be held in conjunction with CVPR 2025 in Nashville, Tennessee. It will feature a host of invited talks covering a variety of topics in Embodied AI, many exciting Embodied AI challenges, a poster session, and panel discussions. For more information on the Embodied AI Workshop series, see our Retrospectives paper on the first three years of the workshop. For the latest updates, follow the Embodied AI Medium blog at medium.com/embodied-artificial-intelligence.

Sign Up for Updates
You can unsubscribe at any time.

#

Timeline

Workshop Announced
March 31st, 2025
Paper Submission Deadline
CLOSED - Friday May 23rd, 2025
Paper Notification Deadline
CLOSED - Monday June 4nd, 2025
Challenge Submission Deadlines
May-June 2025. Check each challenge for the specific date.
Camera Ready Copy Deadline
Tuesday June 11th, 2025
Sixth Annual Embodied AI Workshop at CVPR
Nashville, Tennessee
June 12, 2025
Challenge Winners Announced
At the workshop. Check each challenge for specifics.

#

Workshop Schedule

Embodied AI will be a hybrid workshop, with both in-person talks and streaming via zoom.
  • Workshop Talks: 9:00AM-5:00PM CDT - Room 101D
  • Poster Session: 12:00PM-1:30PM CDT - ExHall D boards #140 to #169
Zoom information can be found for CVPR attendees on our official CVPR workshop page.
Remote and in-person attendees are welcome to ask questions via Slack:

  • Workshop Introduction: Embodied AI
    9:00 - 9:10 AM CDT
    Location: Room 101D
    Anthony Francis
    Logical Robotics
  • Challenge Presentations - Winning Methods
    (ARNOLD, HAZARD, ManiSkill-ViTac, SMM)
    9:10 - 10:00 AM CDT
    Location: Room 101D
    Moderator - David Hall
    CSIRO
  • Challenge Q&A
    10:00 - 10:30 AM CDT
    Location: Room 101D
  • Invited Talk - Embodied AI Applications
    Title: Learning from Humans with Vision and Touch
    10:30 - 11:00 AM CDT
    Location: Room 101D
    Lerrel Pinto
    NYU

    Bio: Lerrel Pinto is an Assistant Professor of Computer Science at NYU Courant and part of the CILVR group. Lerrel runs the General-purpose Robotics and AI Lab (GRAIL) with the goal of getting robots to generalize and adapt in the messy world we live in.

    Abstract: Despite rapid advances in robotics, robots still struggle to achieve the dexterity and adaptability of humans in real-world manipulation tasks. This talk explores how learning directly from humans—leveraging both vision and touch—can bridg... [Expand]
  • Invited Talk - Foundation Models for Embodied AI
    Towards Multimodal Embodied AI Agents that Can See, Talk and Act
    11:00 - 11:30 AM CDT
    Location: Room 101D
    Jianwei Yang
    Microsoft Research

    Bio: Jianwei Yang is a principal researcher in Deep Learning Group at Microsoft Research, Redmond, led by Jianfeng Gao. My research interests generally span in computer vision, multi-modality, and machine learning. Currently, I am focusing on building next-generation vision and multi-modal foundations.

    The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory o... [Expand]
  • Invited Talk - Simulation for Embodied AI
    Title: Geometry and Physics Bias in Embodied AI
    11:30 AM - 12:00 PM CDT
    Location: Room 101D
    Jiayun (Peter) Wang
    Caltech

    Bio: Jiayun (Peter) Wang is a postdoctoral researcher at the California Institute of Technology, working with Prof. Anima Anandkumar. He received his PhD from UC Berkeley in 2023, advised by Prof. Stella Yu. His research develops novel machine learning and computer vision methodologies that address challenges of data scarcity and computational cost, with real-world applications like healthcare. More information can be found at his website: https://pwang.pw/.

    Abstract: Embodied AI demands agents that see the world with geometric fidelity, anticipate and interact with it with physical rigor. The talk will present a three-stage ladder—Perceive, Predict, Control—showing how carefully chosen geometry and phys... [Expand]
  • Lunch / Accepted Papers Poster Session
    12:00 PM - 1:30 PM CDT
    Location: ExHall D
    • EAI's posters will be at boards #140 to #169.
  • Invited Talk - Robotics and Embodied AI
    Title: The Ingredients for Efficient Robot Learning and Exploration
    1:30 - 2:00 PM CDT
    Location: Room 101D
    Rika Antonova
    University of Cambridge

    Bio: Rika Antonova is an Associate Professor at the University of Cambridge. Her research interests include data-efficient reinforcement learning algorithms, robotics, active learning & exploration​. Earlier, Rika was a postdoctoral scholar at Stanford University upon receiving the Computing Innovation Fellowship from the US National Science Foundation. Rikacompleted her PhD at KTH, and earlier she obtained a research Master's degree from the Robotics Institute at Carnegie Mellon University. Before that, Rika was a senior software engineer at Google.

    Abstract: In this talk, I will outline the ingredients for enabling efficient robot learning. First, I will demonstrate how large vision-language models can enhance scene understanding and generalization, allowing robots to learn general rules from s... [Expand]
  • Invited Talk - Foundation Models for Embodied AI
    Title: Large Behavior Models for Dexterous Manipulation
    2:00 - 2:30 PM CDT
    Location: Room 101D
    Rareș Ambruș
    TRI

    Bio: Dr. Rareș Ambruș is a senior manager in the Large Behavior Models division at Toyota Research Institute (TRI). His research interests lie at the intersection of robotics, computer vision and machine learning with the aim of discovering visual representations for embodied applications in areas such as automated driving and robotics. Dr. Ambruș received his Ph.D. in 2017 from the Royal Institute of Technology (KTH), Sweden, focusing on self-supervised perception and mapping for mobile robots. He has more than 100 publications and patents at top AI venues covering fundamental topics in computer vision, machine learning and robotics.

    Abstract: Dexterous manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development... [Expand]
  • Invited Talk - Generative AI for Embodied AI
    Title: World Models at Scale for Embodied Driving
    2:30 - 3:00 PM CDT
    Location: Room 101D
    Nikhil Mohan
    Wayve

    Bio: Nikhil Mohan is a Lead Scientist at Wayve, where he focuses on leveraging data-driven techniques for simulation in autonomous driving. His work spans Neural Radiance Fields (NeRFs), Gaussian Splatting, and generative models, emphasizing their application to improve and evaluate Wayve’s AI Driver performance. Before turning his attention to simulation, Nikhil led Wayve’s production driving team, where they shipped research prototypes into the production system. Prior to joining Wayve, he earned his Master’s degree at Carnegie Mellon University, concentrating in machine learning and signal processing.

    Abstract: Nikhil's talk will focus on using World Models to produce data at scale for Embodied AI in the context of self driving.
  • Invited Talk - Generative AI for Embodied AI
    Title: Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
    3:00 - 3:30 PM CDT
    Huan Ling
    NVIDIA

    Bio: Huan Ling is a Senior Research Scientist at NVIDIA’s Spatial Intelligence (TorontoAI) Lab. His research focuses on developing foundational generative models that enable realistic and controllable environments—spanning video synthesis, 3D/4D scene generation and reconstruction. His work aims for building scalable systems that support real world applications. Huan’s research has been featured at top conferences such as CVPR and NeurIPS, and he actively collaborates across disciplines to advance the frontier of generative AI for real-world applications. He has contributed to the development and large-scale training of video foundation model products, including NVIDIA-COSMOS and COSMOS-Drive-Dreams, which enable high-fidelity, controllable video generation for physicalAI related scenarios like autonomous driving.

    Abstract: Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in trainin... [Expand]
  • Accepted Paper Highlights
    3:30 - 4:00 PM CDT
    • #2: On the use of Pre-trained Visual Representations in Visuo-Motor Robot Learning
    • #6: R-EQA: Retrieval-Augmented Generation for Embodied Question Answering
    • #7: Uncertainty Modeling in Autonomous Vehicle Trajectory Prediction: A Comprehensive Survey
    • #15: Benchmarking Arbitrary Natural Language Tasks in 3D Open Worlds
    • #19: What matters in ImageNav: architecture, pre-training, sim settings, pose
    • #23: MotIF: Motion Instruction Fine-tuning
    Moderator - David Hall
    CSIRO
  • Invited Speaker Panel
    4:00 - 4:30 PM CDT
    Moderator - Anthony Francis
    Logical Robotics
  • Debate on the Future of Embodied AI
    4:30 - 5:00 PM CDT
    Moderator - Anthony Francis
    Logical Robotics
  • Workshop Concludes
    5:00 PM CDT

#

Sponsor Events


#

Challenges

The Embodied AI 2025 workshop is hosting many exciting challenges covering a wide range of topics. More details regarding data, submission instructions, and timelines can be found on the individual challenge websites.

The workshop organizers will award each first-prize challenge winner a cash prize, sponsored by Logical Robotics and our other sponsors.

Challenge winners may be given the opportunity to present during their challenge's presentation at the the workshop. Since many challenges can be grouped into similar tasks, we encourage participants to submit models to more than 1 challenge. The table below describes, compares, and links each challenge.

Challenge
Task
2024 Winner
Simulation Platform
Scene Dataset
Observations
Action Space
Interactive Actions?
Stochastic Acuation?
ARNOLDLanguage-Grounded ManipulationIsaac SimArnold DatasetRGB-D, ProprioceptionContinuous
HAZARDMulti-Object RescueThreeDWorldHAZARD datasetRGB-D, Temperature Sensors, Water LevelDiscrete
ManiSkill-ViTacVision-Tactile Fusion ManipulationSAPIENCustomized ScenariosRGB-D, Proproioception, Tactile SignalsContinuous
Social Mobile ManipulationSocial Mobile ManipulationInfinite World (based on Isaac Sim)SMM DatasetRGB-DContinuous

#

Call for Papers

We invite high-quality 2-page extended abstracts on embodied AI, especially in areas relevant to the themes of this year's workshop:

  • Embodied AI Solutions
  • Advances in Simulation
  • Generative Methods for Embodied AI
  • Foundation Models
as well as themes related to embodied AI in general:
  • Visual Navigation
  • Embodied Mobile Manipulation
  • Embodied Question Answering
  • Embodied Vision & Language
  • Language Model Planning
Accepted papers will be presented as posters or spotlight talks at the workshop. These papers will be made publicly available in a non-archival format, allowing future submission to archival journals or conferences. Paper submissions do not have to be anononymized. Per CVPR rules regarding workshop papers, at least one author must register for CVPR using an in-person registration.

The submission deadline CLOSED on May 23rd ( Anywhere on Earth - for clarity, 2025/05/24 00:01 in GMT as computed by OpenReview). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format.

Note. The order of the papers is randomized each time the page is refreshed.

What matters in ImageNav: architecture, pre-training, sim settings, pose
Gianluca Monaci, Philippe Weinzaepfel, Christian Wolf
State-of-the-art image goal navigation methods either rely on dedicated image-matching or pre-training of vision modules on relative pose estimation or image reconstruction. [Expand]
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Rynaa Grover, Muhammad Zubair Irshad, Zsolt Kira
Sim-to-real transfer and personalization remains a core challenge in Embodied AI due to a trade-off between synthetic environments lacking realism and costly real-world captures. [Expand]
View-Imagination: Enhancing Visuomotor Control with Adaptive View Synthesis
Dohyeok Lee, Munkyung Kim, Jung Min Lee, Seungyub Han, Jungwoo Lee
In robotic manipulation tasks, visuomotor control suffers from limited spatial understanding problems with limited camera installation and visual imperfections, such as occlusion. [Expand]
Object Retrieval-Guided Vision Language Modeling for Embodied Interaction
Constantin Patsch, Yuankai Wu, Marsil Zakour, Eckehard Steinbach
Vision-language model (VLM)-based agents often struggle to name specific or unseen objects in hand-object interactions. [Expand]
Benchmarking Arbitrary Natural Language Tasks in 3D Open Worlds
Sonny George, Chris Sypherd, Rocco Ahching, Dylan Cashman
3D-embodied autonomy toward arbitrary task outcomes is a long-standing goal in AI and Robotics. [Expand]
LLM-Enhanced Rapid-Reflex Async-Reflect Framework for Real-Time Decision Making in Dynamically Changing Environments
Yangqing Zheng, Shunqi Mao, Dingxin Zhang, Weidong Cai
In the realm of embodied intelligence, the evolution of large language models (LLMs) has markedly enhanced agent decision making. [Expand]
Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets
Jeremiah Coholich, Justin Wit, Zsolt Kira
Vision-based policies for robot manipulation have achieved significant recent success, but are still brittle to distribution shifts such as camera viewpoint variations. [Expand]
Data Augmentation in Diffusion Inversion Space
Junfeng Wei, Rongsen Luo, Ziming Cheng, An Mo, Chao Ji
Visual imitation learning methods have demonstrated strong performance and potential, but their generalization ability to unseen environments remains limited. [Expand]
BePo: Efficient Dual Representation for 3D Scene Understanding
Yunxiao Shi, Hong Cai, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Amin Ansari, Fatih Porikli
3D scene understanding fundamentally underlies autonomous systems that power a variety of important applications such as Autonomous Driving, Robotics, and AR/VR. [Expand]
Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI
Aditya Chakravarty
Diffusion models have shown remarkable flexibility for solving inverse problems without task-specific retraining. [Expand]
H^3 DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu, Yufeng Tian, Zhecheng Yuan, Xianbang Wang, Pu Hua, Zhengrong Xue, Huazhe Xu
We introduce Triply-Hierarchical Diffusion Policy (H^3DP), a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation. [Expand]
R-EQA: Retrieval-Augmented Generation for Embodied Question Answering
Hyobin Ong, Minsu Jang
Embodied Question Answering (EQA) is a task where an agent explores its environment, gathers visual information and responds to natural language questions based on that information. [Expand]
Dynamics-Aligned Flow Matching Policy for Robot Learning
Dohyeok Lee, Jung Min Lee, Munkyung Kim, Seokhun Ju, Seungyub Han, Jin Woo Koo, Jungwoo Lee
Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. [Expand]
Uncertainty Modeling in Autonomous Vehicle Trajectory Prediction: A Comprehensive Survey
Siddharth Raina, Jeshwanth Challagundla, Mantek Singh
Agent Behavior prediction is a critical component in autonomous driving systems, requiring the modeling of inherent uncertainties in an agent's future motion. [Expand]
MotIF: Motion Instruction Fine-tuning
Minyoung Hwang, Joey Hejna, Dorsa Sadigh, Yonatan Bisk
While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state -- e.g., if an apple is picked up -- many tasks require observing the full motion of the robot to correctly determine success. [Expand]
On the use of Pre-trained Visual Representations in Visuo-Motor Robot Learning
Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier, Sethu Vijayakumar, Chris Xiaoxuan Lu, Oisin Mac Aodha
The use of pre-trained visual representations (PVRs) in visuo-motor robot learning offers an alternative to training encoders from scratch but we discover that it faces challenges such as temporal entanglement and poor generalisation to minor scene changes. [Expand]
ThinkSafe++: A Semantic Risk Score Framework for Safety-Aware Long-Horizon Planning
Yejin Jo, Minsu Jang
ThinkSafe++ is a safety framework for long-horizon task planning in embodied agents. [Expand]
EED: Embodied Environment Description through Robotic Visual Exploration
Kohei Matsumoto, Asako Kanezaki
The optimal way to convey information about a real environment to humans is through natural language descriptions. [Expand]
Real-Time Multimodal Processing for Interpreting Embodied Actions
Hannah VanderHoeven, Videep Venkatesha, Abhijnan Nath, Nikhil Krishnaswamy
In this paper, we demonstrate how real-time integration of language with embodied gesture and action in a collaborative task enables the generation of AI agent interventions that result in ”positive friction”, or reflection, deliberation, and more mindful collaboration. [Expand]

#

Sponsors

The Embodied AI 2025 Workshop is sponsored by the following organizations:

Logical RoboticsMicrosoftNVIDIAWayve

#

Organizers

The Embodied AI 2025 workshop is a joint effort by a large set of researchers from a variety of organizations. Each year, a set of lead organizers takes point coordinating with the CVPR conference, backed up by a large team of workshop organizers, challenge organizers, and scientific advisors.
Anthony Francis
Logical Robotics
Claudia Pérez D’Arpino
NVIDIA
Luca Weihs
Vercept
Angel X. Chang
SFU
Cem Gokmen
Stanford
Changan Chen
Stanford
Chengshu Li
Stanford
Chris Paxton
Meta AI
David Hall
CSIRO
Devon Hjelm
Apple
German Ros
NVIDIA
Jiaolong Yang
Microsoft
Joanne Truong
GaTech
Lamberto Ballan
U Padova
Lars Johannsmeier
NVIDIA
Mike Roberts
Adobe
Minyoung Hwang
MIT
Naoki Yokoyama
GaTech
Oleksandr Maksymets
Meta AI
Rachith Prakash
Intel
Ram Ramrakhya
Gatech
Ran Gong
UCLA
Vivan Amin
Microsoft
Yonatan Bisk
CMU
Angel X. Chang
SFU
Baoxiong Jia
BIGAI
Changan Chen
Stanford
Chris Paxton
Meta AI
Chuang Gan
IBM, MIT
Dhruv Batra
Yutori
Jiangyong Huang
Peking U
Luca Weihs
Vercept
Manolis Savva
SFU
Naoki Yokoyama
GaTech
Oleksandr Maksymets
Meta AI
Ram Ramrakhya
Gatech
Richard He Bai
Apple
Siyuan Huang
BIGAI
Xiaofeng Gao
Amazon
Yonatan Bisk
CMU
Ade Famoti
Microsoft
Alexander Toshev
Apple
Andrey Kolobov
Microsoft
Angel X. Chang
SFU
Dhruv Batra
Yutori
Joanne Truong
GaTech
Jose A. Iglesias-Guitian
UDC-CITIC
Jose M. Alvarez
NVIDIA
Manolis Savva
SFU
Roberto Martín-Martín
Stanford
Sören Pirk
Kiel University
Attending
Overview
Timeline
Workshop Schedule
Sponsor Events
Challenges
Call for Papers
Sponsors
Organizers