#

Overview

Within the last decade, advances in deep learning, coupled with the creation of large, freely available datasets (e.g., ImageNet), have resulted in remarkable progress in the computer vision, NLP, and broader AI communities. This progress has enabled models to begin to obtain superhuman performance on a wide variety of passive tasks. However, this progress has also enabled a paradigm shift that a growing collection of researchers take aim at: the creation of an embodied agent (e.g., a robot) which learns, through interaction and exploration, to creatively solve challenging tasks within its environment.

The goal of this workshop is to bring together researchers from the fields of computer vision, language, graphics, and robotics to share and discuss the current state of intelligent agents that can:

See: perceive their environment through vision or other senses.
Talk: hold a natural language dialog grounded in their environment.
Listen: understand and react to audio input anywhere in a scene.
Act: navigate and interact with their environment to accomplish goals.
Reason: consider and plan for the long-term consequences of their actions.

The Embodied AI 2021 workshop will be held virtually in conjunction with CVPR 2021. It will feature a host of invited talks covering a variety of topics in Embodied AI, many exciting challenges, a poster session, and panel discussions.

#

Timeline

CVPR Workshop

June 20, 2021. Tentative Schedule:

Livestream
6:30 AM - 6:00 PM PST
Speaker Panel
11:00 AM PST
Ask questions on Slack
Lunch
12:00 AM PST
Poster Session
1:00 PM PST
Join on gather.town
Navigation Challenge Results
2:00 PM PST
Navigation Panel
3:00 PM PST
Ask questions on Slack
Interaction Challenge Results
4:00 PM PST
Interaction Panel
5:00 PM PST
Ask questions on Slack

Challenge Submission Deadlines

May 2021. Check each challenge for the specific date.

Paper Submission Deadline

May 14, 2021 (Anywhere on Earth)

Workshop Announced

Feb 17, 2021

#

Panel Sessions

Speaker Panel

Ask Questions on Slack

Questions can be asked anonymously.

Date. June 20th, 11 AM PST.

Panel. The panel consists of speakers at this workshop.

Moderator. Erik Wijmans.

Topics. The topics are based on questions, likely involving cognitive development in humans, progress in embodied AI tasks, sim-2-real transfer, robotics, embodied AI for all, and more!

Navigation Panel

Ask Questions on Slack

Questions can be asked anonymously.

Date. June 20th, 3 PM PST.

Panel. The panel consists of challenge organizers who organized navigation tasks.

Moderator. Luca Weihs.

Topics. The topics are based on questions, likely involving navigation benchmarks and tasks, the "reality" gap, robotics, simulation platforms, and more!

Interaction Panel

Ask Questions on Slack

Questions can be asked anonymously.

Date. June 20th, 5 PM PST.

Panel. The panel consists of challenge organizers who organized interaction tasks.

Moderator. Chengshu (Eric) Li.

Topics. The topics are based on questions, likely involving interaction benchmarks and tasks, vision-and-language, rearrangement, leveraging audio, the "reality" gap, robotics, simulation platforms, and more!

#

Invited Speakers

Motivation for Embodied AI Research

Hyowon Gweon
Stanford

Embodied Navigation

Peter Anderson
Google

Aleksandra Faust
Google

Robotics

Anca Dragan
UC Berkeley

Chelsea Finn
Stanford
Google

Akshara Rai
Facebook AI Research

Sim-2-Real Transfer

Sanja Fidler
University of Toronto
NVIDIA

Konstantinos Bousmalis
DeepMind

#

Challenges

The Embodied AI 2021 workshop is hosting many exciting challenges covering a wide range of topics such as rearrangement, visual navigation, vision-and-language, and audio-visual navigation. More details regarding data, submission instructions, and timelines can be found on the individual challenge websites.

Challenge winners will be given the opportunity to present a talk at the workshop. Since many challenges can be grouped into similar tasks, we encourage participants to submit models to more than 1 challenge. The table below describes, compares, and links each challenge.

Challenge	Task	Video	Spotlight	Interactive Actions?	Simulation Platform	Scene Dataset	Observations	Stochastic Acuation?	Action Space


AI2-THOR ObjectNav	ObjectNav	1st Place		AI2-THOR	RoboTHOR	RGB-D	✓	Discrete
AI2-THOR Rearrangement	Rearrangement		✓	AI2-THOR	iTHOR	RGB-D, Localization		Discrete
ALFRED	Vision-and-Language Interaction		✓	AI2-THOR	iTHOR	RGB		Discrete
Habitat	ObjectNav	1st Place 2nd Place		Habitat	Matterport3D	RGB-D, Localization		Discrete
Habitat	PointNav v2	1st Place 2nd Place		Habitat	Gibson	Noisy RGB-D	✓	Discrete
iGibson	Interactive Navigation	1st Place 2nd Place 4th Place	✓	iGibson	iGibson	RGB-D	✓	Continuous
iGibson	Social Navigation	1st Place 3rd Place 4th Place 5th Place	✓	iGibson	iGibson	RGB-D	✓	Continuous
MultiON	Multi-Object Navigation	1st Place 2nd Place 3rd Place		Habitat	Matterport3D	RGB-D, Localization		Discrete
Robotic Vision Scene Understanding	Rearrangement (SCD)			Isaac Sim	Active Scene Understanding	RGB-D, Pose Data, Flatscan Laser	✓	Discrete
Robotic Vision Scene Understanding	Semantic SLAM			Isaac Sim	Active Scene Understanding	RGB-D, Pose Data, Flatscan Laser	Partially	Discrete
RxR-Habitat	Vision-and-Language Navigation			Habitat	Matterport3D	RGB-D		Discrete
SoundSpaces	Audio Visual Navigation			Habitat	Matterport3D	RGB-D, Audio Waveform		Discrete
TDW-Transport	Rearrangement		✓	TDW	TDW	RGB-D, Metadata	✓	Discrete

#

Call for Papers

We invite high-quality 2-page extended abstracts in relevant areas, such as:

Simulation Environments
Visual Navigation
Rearrangement
Embodied Question Answering
Simulation-to-Real Transfer
Embodied Vision & Language

Accepted papers will be presented as posters. These papers will be made publicly available in a non-archival format, allowing future submission to archival journals or conferences.

Submission

The submission deadline is May 14th (Anywhere on Earth). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format. Paper submissions are now closed.

Accepted Papers

Note. The order of the papers is randomized each time the page is refreshed.

Modular Framework for Visuomotor Language Grounding

Kolby T Nottingham, Litian Liang, Daeyun Shin, Charless Fowlkes, Roy Fox, Sameer Singh

Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research. [Expand]

PDF

Poster

Success-Aware Visual Navigation Agent

Mahdi Kazemi Moghaddam, Ehsan M Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton van den Hengel

This work presents a method to improve the efficiency and robustness of the previous model-free Reinforcement Learning (RL) algorithms for the task of object-target visual navigation. [Expand]

PDF

Poster

A Neural-Symbolic Approach for Object Navigation

Xiaotian Liu, Christian Muise

Object navigation refers to the task of discovering and locating objects in an unknown environment. [Expand]

PDF

Poster

BEyond observation: an approach for ObjectNav

Daniel V Ruiz, Eduardo Todt

With the rise of automation, unmanned vehicles became a hot topic both as commercial products and as a scientific research topic. [Expand]

PDF

Poster

HexaJungle: a MARL Simulator to Study the Emergence of Language

Kiran Ikram, Esther Mondragon, Eduardo Alonso, Michaël Garcia Ortiz

Multi-agent reinforcement learning in mixed-motive settings allows for the study of complex dynamics of agent interactions. [Expand]

PDF

Poster

URoboSim: A Simulation-Based Predictive Modelling Engine for Cognition-Enabled Robot Manipulation

Michael Neumann, Michael Beetz, Andrei Haidu

In a nutshell robot simulators are fully developed software systems that provide simulations as a substitute for real-world activity. [Expand]

PDF

Poster

Massively Parallel Robot Simulations with the HBP Neurorobotics Platform

Florian Walter, Mahmoud Akl, Fabrice O. Morin, Alois Knoll

The success of deep learning in robotics hinges on the availability of physically accurate virtual training environments and simulation tools that accelerate learning by scaling to many parallel instances. [Expand]

PDF

Poster

Pathdreamer: A World Model for Indoor Navigation

Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals. [Expand]

PDF

Poster

PixelEDL: Unsupervised Skill Discovery and Learning from Pixels

Roger Creus Castanyer, Juan José Nieto, Xavier Giro-i-Nieto

We tackle embodied visual navigation in a task-agnostic set-up by putting the focus on the unsupervised discovery of skills (or options) that provide a good coverage of states. [Expand]

PDF

Poster

PGDrive: Procedural Generation of Driving Environments for Generalization

Quanyi Li, Zhenghao Peng, Qihang Zhang, Chunxiao Liu, Bolei Zhou

To better evaluate and improve the generalization of end-to-end driving, we introduce an open-ended and highly configurable driving simulator called PGDrive, following a key feature of procedural generation. [Expand]

PDF

Poster

LegoTron: An Environment for Interactive Structural Understanding

Aaron T Walsman, Muru Zhang, Adam Fishman, Karthik Desingh, Dieter Fox, Ali Farhadi

Visual reasoning about geometric structures with detailed spatial relationships is a fundamental component of human intelligence. [Expand]

PDF

Poster

PiCoEDL: Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates

Juan José Nieto, Roger Creus Castanyer, Xavier Giró-i-Nieto

Defining a reward function in Reinforcement Learning (RL) is not always possible or very costly. [Expand]

PDF

Poster

Agent with the Big Picture: Perceiving Surroundings for Interactive Instruction Following

Byeonghwi Kim, Suvaansh Bhambri, Kunal Pratap Singh, Roozbeh Mottaghi, Jonghyun Choi

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. [Expand]

PDF

Poster

Learning to Explore, Navigate and Interact for Visual Room Rearrangement

Ue-Hwan Kim, Youngho Kim, Jin-Man Park, Hwansoo Choi, Jong-Hwan Kim

Intelligent agents for visual room rearrangement aim to reach a goal room configuration from a cluttered room configuration via a sequence of interactions. [Expand]

PDF

Poster

RobustNav: Towards Benchmarking Robustness in Embodied Navigation

Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi

As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual – affecting RGB inputs – and dynamics – affecting transition dynamics – corruptions. [Expand]

PDF

Poster

#

Organizers

The Embodied AI 2021 workshop is a joint effort by a large set of researchers from a variety of organizations. They are listed below in alphabetical order.

Organizing Committee