Embodied AI Workshop
CVPR 2021

#

Overview

Within the last decade, advances in deep learning, coupled with the creation of large, freely available datasets (e.g., ImageNet), have resulted in remarkable progress in the computer vision, NLP, and broader AI communities. This progress has enabled models to begin to obtain superhuman performance on a wide variety of passive tasks. However, this progress has also enabled a paradigm shift that a growing collection of researchers take aim at: the creation of an embodied agent (e.g., a robot) which learns, through interaction and exploration, to creatively solve challenging tasks within its environment.

The goal of this workshop is to bring together researchers from the fields of computer vision, language, graphics, and robotics to share and discuss the current state of intelligent agents that can:

  • See: perceive their environment through vision or other senses.
  • Talk: hold a natural language dialog grounded in their environment.
  • Listen: understand and react to audio input anywhere in a scene.
  • Act: navigate and interact with their environment to accomplish goals.
  • Reason: consider and plan for the long-term consequences of their actions.

The Embodied AI 2021 workshop will be held virtually in conjunction with CVPR 2021. It will feature a host of invited talks covering a variety of topics in Embodied AI, many exciting challenges, a poster session, and panel discussions.

Sign Up for Updates
You can unsubscribe at any time.

#

Timeline

CVPR Workshop
June 20, 2021. Tentative Schedule:
Challenge Submission Deadlines
May 2021. Check each challenge for the specific date.
Paper Submission Deadline
May 14, 2021 (Anywhere on Earth)
Workshop Announced
Feb 17, 2021

#

Panel Sessions

Ask Questions on Slack
Questions can be asked anonymously.

Date. June 20th, 11 AM PST.

Panel. The panel consists of speakers at this workshop.

Moderator. Erik Wijmans.

Topics. The topics are based on questions, likely involving cognitive development in humans, progress in embodied AI tasks, sim-2-real transfer, robotics, embodied AI for all, and more!

Ask Questions on Slack
Questions can be asked anonymously.

Date. June 20th, 3 PM PST.

Panel. The panel consists of challenge organizers who organized navigation tasks.

Moderator. Luca Weihs.

Topics. The topics are based on questions, likely involving navigation benchmarks and tasks, the "reality" gap, robotics, simulation platforms, and more!

Ask Questions on Slack
Questions can be asked anonymously.

Date. June 20th, 5 PM PST.

Panel. The panel consists of challenge organizers who organized interaction tasks.

Moderator. Chengshu (Eric) Li.

Topics. The topics are based on questions, likely involving interaction benchmarks and tasks, vision-and-language, rearrangement, leveraging audio, the "reality" gap, robotics, simulation platforms, and more!

#

Invited Speakers

Hyowon Gweon
Stanford
Peter Anderson
Google
Aleksandra Faust
Google
Anca Dragan
UC Berkeley
Chelsea Finn
Stanford
Google
Akshara Rai
Facebook AI Research
Sanja Fidler
University of Toronto
NVIDIA
Konstantinos Bousmalis
DeepMind

#

Challenges

The Embodied AI 2021 workshop is hosting many exciting challenges covering a wide range of topics such as rearrangement, visual navigation, vision-and-language, and audio-visual navigation. More details regarding data, submission instructions, and timelines can be found on the individual challenge websites.

Challenge winners will be given the opportunity to present a talk at the workshop. Since many challenges can be grouped into similar tasks, we encourage participants to submit models to more than 1 challenge. The table below describes, compares, and links each challenge.

Challenge
Task
VideoSpotlight
Interactive Actions?
Simulation Platform
Scene Dataset
Observations
Stochastic Acuation?
Action Space
AI2-THOR ObjectNavObjectNav
1st Place
AI2-THORRoboTHORRGB-DDiscrete
AI2-THOR RearrangementRearrangement
AI2-THORiTHORRGB-D, LocalizationDiscrete
ALFREDVision-and-Language Interaction
AI2-THORiTHORRGBDiscrete
HabitatObjectNav
1st Place
2nd Place
HabitatMatterport3DRGB-D, LocalizationDiscrete
HabitatPointNav v2
1st Place
2nd Place
HabitatGibsonNoisy RGB-DDiscrete
iGibsonInteractive Navigation
1st Place
2nd Place
4th Place
iGibsoniGibsonRGB-DContinuous
iGibsonSocial Navigation
1st Place
3rd Place
4th Place
5th Place
iGibsoniGibsonRGB-DContinuous
MultiONMulti-Object Navigation
1st Place
2nd Place
3rd Place
HabitatMatterport3DRGB-D, LocalizationDiscrete
Robotic Vision Scene UnderstandingRearrangement (SCD)
Isaac SimActive Scene UnderstandingRGB-D, Pose Data, Flatscan LaserDiscrete
Robotic Vision Scene UnderstandingSemantic SLAM
Isaac SimActive Scene UnderstandingRGB-D, Pose Data, Flatscan LaserPartiallyDiscrete
RxR-HabitatVision-and-Language Navigation
HabitatMatterport3DRGB-DDiscrete
SoundSpacesAudio Visual Navigation
HabitatMatterport3DRGB-D, Audio WaveformDiscrete
TDW-TransportRearrangement
TDWTDWRGB-D, MetadataDiscrete

#

Call for Papers

We invite high-quality 2-page extended abstracts in relevant areas, such as:

  • Simulation Environments
  • Visual Navigation
  • Rearrangement
  • Embodied Question Answering
  • Simulation-to-Real Transfer
  • Embodied Vision & Language
Accepted papers will be presented as posters. These papers will be made publicly available in a non-archival format, allowing future submission to archival journals or conferences.

The submission deadline is May 14th (Anywhere on Earth). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format. Paper submissions are now closed.

Note. The order of the papers is randomized each time the page is refreshed.

PixelEDL: Unsupervised Skill Discovery and Learning from Pixels
Roger Creus Castanyer, Juan José Nieto, Xavier Giro-i-Nieto
We tackle embodied visual navigation in a task-agnostic set-up by putting the focus on the unsupervised discovery of skills (or options) that provide a good coverage of states. [Expand]
LegoTron: An Environment for Interactive Structural Understanding
Aaron T Walsman, Muru Zhang, Adam Fishman, Karthik Desingh, Dieter Fox, Ali Farhadi
Visual reasoning about geometric structures with detailed spatial relationships is a fundamental component of human intelligence. [Expand]
Agent with the Big Picture: Perceiving Surroundings for Interactive Instruction Following
Byeonghwi Kim, Suvaansh Bhambri, Kunal Pratap Singh, Roozbeh Mottaghi, Jonghyun Choi
Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. [Expand]
Massively Parallel Robot Simulations with the HBP Neurorobotics Platform
Florian Walter, Mahmoud Akl, Fabrice O. Morin, Alois Knoll
The success of deep learning in robotics hinges on the availability of physically accurate virtual training environments and simulation tools that accelerate learning by scaling to many parallel instances. [Expand]
Learning to Explore, Navigate and Interact for Visual Room Rearrangement
Ue-Hwan Kim, Youngho Kim, Jin-Man Park, Hwansoo Choi, Jong-Hwan Kim
Intelligent agents for visual room rearrangement aim to reach a goal room configuration from a cluttered room configuration via a sequence of interactions. [Expand]
PGDrive: Procedural Generation of Driving Environments for Generalization
Quanyi Li, Zhenghao Peng, Qihang Zhang, Chunxiao Liu, Bolei Zhou
To better evaluate and improve the generalization of end-to-end driving, we introduce an open-ended and highly configurable driving simulator called PGDrive, following a key feature of procedural generation. [Expand]
Pathdreamer: A World Model for Indoor Navigation
Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals. [Expand]
BEyond observation: an approach for ObjectNav
Daniel V Ruiz, Eduardo Todt
With the rise of automation, unmanned vehicles became a hot topic both as commercial products and as a scientific research topic. [Expand]
HexaJungle: a MARL Simulator to Study the Emergence of Language
Kiran Ikram, Esther Mondragon, Eduardo Alonso, Michaël Garcia Ortiz
Multi-agent reinforcement learning in mixed-motive settings allows for the study of complex dynamics of agent interactions. [Expand]
PiCoEDL: Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates
Juan José Nieto, Roger Creus Castanyer, Xavier Giró-i-Nieto
Defining a reward function in Reinforcement Learning (RL) is not always possible or very costly. [Expand]
A Neural-Symbolic Approach for Object Navigation
Xiaotian Liu, Christian Muise
Object navigation refers to the task of discovering and locating objects in an unknown environment. [Expand]
Modular Framework for Visuomotor Language Grounding
Kolby T Nottingham, Litian Liang, Daeyun Shin, Charless Fowlkes, Roy Fox, Sameer Singh
Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research. [Expand]
URoboSim: A Simulation-Based Predictive Modelling Engine for Cognition-Enabled Robot Manipulation
Michael Neumann, Michael Beetz, Andrei Haidu
In a nutshell robot simulators are fully developed software systems that provide simulations as a substitute for real-world activity. [Expand]
Success-Aware Visual Navigation Agent
Mahdi Kazemi Moghaddam, Ehsan M Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton van den Hengel
This work presents a method to improve the efficiency and robustness of the previous model-free Reinforcement Learning (RL) algorithms for the task of object-target visual navigation. [Expand]
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi
As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual – affecting RGB inputs – and dynamics – affecting transition dynamics – corruptions. [Expand]

#

Organizers

The Embodied AI 2021 workshop is a joint effort by a large set of researchers from a variety of organizations. They are listed below in alphabetical order.
Anthony Francis
Google
Chengshu Li
Stanford
German Ros
Intel
Joanne Truong
Georgia Tech
Luca Weihs
AI2
Erik Wijmans
Georgia Tech
Peter Anderson
Google
Dhruv Batra
FAIR, Georgia Tech
Yonatan Bisk
CMU
Suman Bista
QUT, ACRV, QCR
Angel X. Chang
SFU
Changan Chen
UT Austin, FAIR
Claudia D'Arpino
Stanford
Feras Dayoub
QUT, ACRV, QCR
Matt Deitke
AI2, UW
Anthony Francis
Google
Chuang Gan
MIT-IBM
Aaron Gokaslan
FAIR
Kristen Grauman
UT Austin, FAIR
David Hall
QUT, ACRV, QCR
Winson Han
AI2
Rishabh Jain
Georgia Tech
Unnat Jain
UIUC
Jaewoo Jang
Stanford
Aniruddha Kembhavi
AI2, UW
Apoorv Khandelwal
AI2
Eric Kolve
AI2
Jacob Krantz
Oregon State
Alex Ku
Google
Stefan Lee
Oregon State
Chengshu Li
Stanford
Oleksandr Maksymets
FAIR
Roberto Martín-Martín
Stanford
Roozbeh Mottaghi
AI2, UW
Shivansh Patel
IIT Kanpur, SFU
Manolis Savva
SFU
Mohit Shridhar
UW
Rohan Smith
QUT, ACRV, QCR
Niko Sünderhauf
QUT, ACRV, QCR
Ben Talbot
QUT, ACRV, QCR
Josh Tenenbaum
MIT
Jesse Thomason
USC
Alexander Toshev
Google
Saim Wani
IIT Kanpur
Luca Weihs
AI2
Andrew Westbury
FAIR
Erik Wijmans
Georgia Tech
Fei Xia
Stanford
Haoyang Zhang
QUT, ACRV, QCR
Jose M. Alvarez
NVIDIA
Dhruv Batra
FAIR, Georgia Tech
Sonia Chernova
Georgia Tech
Ali Farhadi
UW
Jose A. Iglesias-Guitian
UDC-CITIC
Aniruddha Kembhavi
AI2, UW
Vladlen Koltun
Intel
Fei-Fei Li
Stanford
Antonio M. Lopez
UAB-CVC
Oleksandr Maksymets
FAIR
Jitendra Malik
FAIR, UC Berkeley
Roberto Martín-Martín
Stanford
Roozbeh Mottaghi
AI2, UW
Devi Parikh
FAIR, Georgia Tech
Silvio Savarese
Stanford
Manolis Savva
SFU
Alexander Toshev
Google
Overview
Timeline
Panel Sessions
Invited Speakers
Challenges
Call for Papers
Organizers