Embodied AI Workshop
CVPR 2022

#

Overview

Within the last decade, advances in deep learning, coupled with the creation of large, freely available datasets (e.g., ImageNet), have resulted in remarkable progress in the computer vision, NLP, and broader AI communities. This progress has enabled models to begin to obtain superhuman performance on a wide variety of passive tasks. However, this progress has also enabled a paradigm shift that a growing collection of researchers take aim at: the creation of an embodied agent (e.g., a robot) which learns, through interaction and exploration, to creatively solve challenging tasks within its environment.

The goal of this workshop is to bring together researchers from the fields of computer vision, language, graphics, and robotics to share and discuss the current state of intelligent agents that can:

  • See: perceive their environment through vision or other senses.
  • Talk: hold a natural language dialog grounded in their environment.
  • Listen: understand and react to audio input anywhere in a scene.
  • Act: navigate and interact with their environment to accomplish goals.
  • Reason: consider and plan for the long-term consequences of their actions.

The Embodied AI 2022 workshop will be held in conjunction with CVPR 2022. It will feature a host of invited talks covering a variety of topics in Embodied AI, many exciting challenges, a poster session, and panel discussions.

Sign Up for Updates
You can unsubscribe at any time.

#

Timeline

CVPR Workshop
Room 224, New Orleans Ernest M. Morial Conventinon Center
June 19, 2022
9:00 AM - 5:30 PM CT
Tentative Schedule:
Challenge Submission Deadlines
May 2022. Check each challenge for the specific date.
Paper Submission Deadline
May 16, 2022 (Anywhere on Earth)
Workshop Announced
Feb 14, 2022

#

Challenges

The Embodied AI 2022 workshop is hosting many exciting challenges covering a wide range of topics such as rearrangement, visual navigation, vision-and-language, and audio-visual navigation. More details regarding data, submission instructions, and timelines can be found on the individual challenge websites.

Challenge winners will be given the opportunity to present a talk at the workshop. Since many challenges can be grouped into similar tasks, we encourage participants to submit models to more than 1 challenge. The table below describes, compares, and links each challenge.

Challenge
Task
Interactive Actions?
Simulation Platform
Scene Dataset
Observations
Stochastic Acuation?
Action Space
AI2-THOR RearrangementRearrangementAI2-THORiTHORRGB-D, LocalizationDiscrete
ALFREDVision-and-Language InteractionAI2-THORiTHORRGBDiscrete
HabitatObjectNavHabitatMatterport3DRGB-D, LocalizationDiscrete
iGibsonInteractive NavigationiGibsoniGibsonRGB-DContinuous
iGibsonSocial NavigationiGibsoniGibsonRGB-DContinuous
MultiONMulti-Object NavigationHabitatMatterport3DRGB-D, LocalizationDiscrete
Robotic Vision Scene UnderstandingRearrangement (SCD)Isaac SimActive Scene UnderstandingRGB-D, Pose Data, Flatscan LaserDiscrete
Robotic Vision Scene UnderstandingSemantic SLAMIsaac SimActive Scene UnderstandingRGB-D, Pose Data, Flatscan LaserPartiallyDiscrete
RxR-HabitatVision-and-Language NavigationHabitatMatterport3DRGB-DDiscrete
SoundSpacesAudio Visual NavigationHabitatMatterport3DRGB-D, Audio WaveformDiscrete
TDW-TransportRearrangementTDWTDWRGB-D, MetadataDiscrete
TEAChVision-and-Dialogue InteractionAI2-THORiTHORRGBDiscrete, Text Generation

#

Call for Papers

We invite high-quality 2-page extended abstracts in relevant areas, such as:

  • Simulation Environments
  • Visual Navigation
  • Rearrangement
  • Embodied Question Answering
  • Simulation-to-Real Transfer
  • Embodied Vision & Language
Accepted papers will be presented as posters or spotlight talks at the workshop. These papers will be made publicly available in a non-archival format, allowing future submission to archival journals or conferences. Paper submissions do not have to be anononymized. Per CVPR rules regarding workshop papers, at least one author must register for CVPR using an in-person registration.

The submission deadline is May 16th (Anywhere on Earth). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format. Paper submissions are now closed.

Note. The order of the papers is randomized each time the page is refreshed.

FedVLN: Privacy-preserving Federated Vision-and-Language Navigation
Kaiwen Zhou, Xin Eric Wang
Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. [Expand]
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh N Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason M Baldridge, Peter Anderson
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. [Expand]
Simple and Effective Synthesis of Indoor 3D Scenes
Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason M Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images. [Expand]
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Mukund, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal
We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. [Expand]
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents
Ziang Liu, Roberto Martin-Martin, Fei Xia, Jiajun Wu, Li Fei-Fei
Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks. [Expand]
Learning Value Functions from Undirected State-only Experience
Matthew Chang, Arjun Gupta, Saurabh Gupta
This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. [Expand]
Towards Generalisable Audio Representations for Audio-Visual Navigation
Shunqi Mao, Chaoyi Zhang, Heng Wang, Weidong Cai
In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments based on its audio and visual perceptions. [Expand]
ET tu, CLIP? Addressing Common Object Errors for Unseen Environments
Jimin Sun, Ye Won Byun, Shahriar Noroozizadeh, Rosanna M Vitiello, Cathy L Jiao
We introduce a simple method that employs pre-trained CLIP encoders to enhance model generalization in the ALFRED task. [Expand]
Learning to navigate in interactive Environments with the transformer-based memory
Weiyuan Li, Ruoxin Hong, Jiwei Shen, Yue Lu
Substantial progress has been achieved in embodied visual navigation based on reinforcement learning (RL). [Expand]
Modality-invariant Visual Odometry for Indoor Navigation
Marius Memmel, Amir Zamir
Successful indoor navigation is a crucial skill for many robots. [Expand]
Learning to navigate in interactive Environments with the transformer-based memory
Weiyuan Li, Ruoxin Hong, Jiwei Shen, Yue Lu
Substantial progress has been achieved in embodied visual navigation based on reinforcement learning (RL). [Expand]
Bridging the Gap between Events and Frames through Unsupervised Domain Adaptation
Nico Messikommer, Daniel Gehrig, Mathias Gehrig, Davide Scaramuzza
Event cameras are novel sensors with outstanding properties such as high temporal resolution and high dynamic range. [Expand]
A Planning based Neural-Symbolic Approach for Embodied Instruction Following
Xiaotian Liu, Hector Palacios, Christian Muise
The ALFRED environment features an embodied agent following instructions and accomplishing tasks in simulated home environments. [Expand]
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das
We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments – (1) ObjectGoal Navigation (e.g. [Expand]
IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents
Artem Zholus, Alexey Skrynnik, Shrestha Mohanty, Zoya Volovikova, Julia Kiseleva, Arthur Szlam, Marc-Alexandre Côté, Aleksandr Panov
We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. [Expand]
VLMbench: A Benchmark for Vision-and-Language Manipulation
Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric Wang
One crucial ability of embodied agents is to finish tasks by following language instructions. [Expand]
Language Guided Meta-Control for Embodied Instruction Following
Divyam Goel, Kunal Pratap Singh, Jonghyun Choi
Embodied Instruction Following (EIF) is a challenging problem requiring an agent to infer a sequence of actions to achieve a goal environment state from complex language and visual inputs. [Expand]
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
Ankit Goyal, Arsalan Mousavian, Chris Paxton, Yu-Wei Chao, Brian Okorn, Jia Deng, Dieter Fox
Accurate object rearrangement from vision is a crucial problem for a wide variety of real-world robotics applications in unstructured environments. [Expand]
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: The Winning Entry of the 2021 iGibson Challenge
Naoki Yokoyama, Qian Luo, Dhruv Batra, Sehoon Ha
While impressive progress has been made for teaching embodied agents to navigate static environments using vision, much less progress has been made on more dynamic environments that may include moving pedestrians or movable obstacles. [Expand]
ABCDE: An Agent-Based Cognitive Development Environment
Jieyi Ye, Jiafei Duan, Samson Yu, Bihan Wen, Cheston Tan
Children’s cognitive abilities are sometimes cited as AI benchmarks. [Expand]
SAMPLE-HD: Simultaneous Action and Motion Planning Learning Environment
Michal Nazarczuk, Tony Ng, Krystian Mikolajczyk
Humans exhibit incredibly high levels of multi-modal understanding - combining visual cues with read, or heard knowledge comes easy to us and allows for very accurate interaction with the surrounding environment. [Expand]
Role of reward shaping in object-goal navigation
Srirangan Madhavan, Anwesan Pal, Henrik Christensen
Deep reinforcement learning approaches have been a popular method for visual navigation tasks in the computer vision and robotics community of late. [Expand]
Human Instruction Following: Graph Neural Network Guided Object Navigation
Hongyi Chen, Letian Wang, Yuhang Yao, Ye Zhao, Patricio Vela
Home-assistant robots (e.g., mobile manipulator) following human instruction is a long-standing topic of research whose main challenge comes from the interpretation of diverse instructions and dynamically-changing environments. [Expand]

#

Organizers

The Embodied AI 2022 workshop is a joint effort by a large set of researchers from a variety of organizations. They are listed below in alphabetical order.
Andrew Szot
GaTech
Anthony Francis
Google
Chengshu Li
Stanford
Claudia Pérez D’Arpino
NVIDIA
Devendra Singh Chaplot
Meta AI
German Ros
Intel
Joanne Truong
GaTech
Luca Weihs
AI2
Matt Deitke
AI2, UW
Mike Roberts
Intel
Oleksandr Maksymets
Meta AI
Sören Pirk
Google
Aaron Gokaslan
Cornell
Alex Ku
Google
Alexander Clegg
Meta AI
Alexander Toshev
Google
Angel X. Chang
SFU
Aniruddha Kembhavi
AI2, UW
Anthony Francis
Google
Anthony Liang
USC
Ben Talbot
QUT
Changan Chen
UT Austin
Chengshu Li
Stanford
Chuang Gan
IBM, MIT
Claudia Pérez D’Arpino
NVIDIA
David Hall
QUT
Devendra Singh Chaplot
Meta AI
Dhruv Batra
GaTech, Meta AI
Eric Kolve
AI2
Eric Undersander
Meta AI
Erik Wijmans
GaTech
Fei Xia
Google
Ishika Singh
USC
Jacob Krantz
Oregon State
Jesse Thomason
USC
Josh Tenenbaum
MIT
Karmesh Yadav
Meta AI
Kristen Grauman
UT Austin
Luca Weihs
AI2
Manolis Savva
SFU
Matt Deitke
AI2, UW
Mohit Shridhar
UW
Niko Sünderhauf
QUT
Oleksandr Maksymets
Meta AI
Peter Anderson
Google
Ram Ramrakhya
Gatech
Rishabh Jain
eBay
Roberto Martín-Martín
Stanford
Roozbeh Mottaghi
AI2, UW
Sagnik Majumder
UT Austin
Santhosh Kumar Ramakrishnan
UT Austin
Sonia Raychaudhuri
SFU
Stefan Lee
Oregon State
Tiffany Min
CMU
Tommaso Campari
SFU, UNIPD
Unnat Jain
UIUC
Yonatan Bisk
CMU
Alexander Toshev
Google
Ali Farhadi
Apple, UW
Aniruddha Kembhavi
AI2, UW
Antonio M. Lopez
UAB-CVC
Devi Parikh
GaTech, Meta AI
Dhruv Batra
GaTech, Meta AI
Fei-Fei Li
Stanford
Jie Tan
Google
Jose A. Iglesias-Guitian
UDC-CITIC
Jose M. Alvarez
NVIDIA
Manolis Savva
SFU
Roberto Martín-Martín
Stanford
Roozbeh Mottaghi
AI2, UW
Silvio Savarese
Salesforce, Stanford
Sonia Chernova
GaTech
Overview
Timeline
Challenges
Call for Papers
Organizers