Minds live in bodies, and bodies move through a changing world. The goal of embodied artificial intelligence is to create agents, such as robots, which learn to creatively solve challenging tasks requiring interaction with the environment. While this is a tall order, fantastic advances in deep learning and the increasing availability of large datasets like ImageNet have enabled superhuman performance on a variety of AI tasks previously thought intractable. Computer vision, speech recognition and natural language processing have experienced transformative revolutions at passive input-output tasks like language translation and image processing, and reinforcement learning has similarly achieved world-class performance at interactive tasks like games. These advances have supercharged embodied AI, enabling a growing collection of researchers to make rapid progress towards intelligent agents which can:
The goal of the Embodied AI workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. This year's workshop will focus on the three themes of:
Saurabh Gupta is an Assistant Professor in the ECE Department at UIUC. Before starting at UIUC in 2019, he received his Ph.D. from UC Berkeley in 2018 and spent the following year as a Research Scientist at Facebook AI Research in Pittsburgh. His research interests span computer vision, robotics, and machine learning, with a focus on building agents that can intelligently interact with the physical world around them. He received the President's Gold Medal at IIT Delhi in 2011, the Google Fellowship in Computer Vision in 2015, an Amazon Research Award in 2020, and an NSF CAREER Award in 2022. He has also won many challenges at leading computer vision conferences.
Fei Xia is a Research Scientist at Google Research where he works on the Robotics team. He received his PhD degree from the Department of Electrical Engineering, Stanford University. He was co-advised by Silvio Savarese in SVL and Leonidas Guibas. His mission is to build intelligent embodied agents that can interact with complex and unstructured real-world environments, with applications to home robotics. He has been approaching this problem from 3 aspects: 1) Large scale and transferrable simulation for Robotics. 2) Learning algorithms for long-horizon tasks. 3) Combining geometric and semantic representation for environments. Most recently, He has been exploring using foundation models for robot decision making.
Russ Salakhutdinov is a UPMC Professor of Computer Science in the Department of Machine Learning at CMU. He received his PhD in computer science from the University of Toronto. After spending two post-doctoral years at MIT, he joined the University of Toronto and later moved to CMU. Russ's primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research, served as a director of AI research at Apple, served on the senior programme committee of several top-tier learning conferences including NeurIPS and ICML, was a program co-chair for ICML 2019, and will serve as a general chair for ICML 2024. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, a recipient of the Early Researcher Award, Google Faculty Award, and Nvidia's Pioneers of AI award.
Dieter Fox received his PhD degree from the University of Bonn, Germany. He is a professor in the Allen School of Computer Science & Engineering at the University of Washington, where he heads the UW Robotics and State Estimation Lab. He is also Senior Director of Robotics Research at NVIDIA. His research is in robotics and artificial intelligence, with a focus on learning and estimation applied to problems such as robot manipulation, planning, language grounding, and activity recognition. He has published more than 300 technical papers and is co-author of the textbook "Probabilistic Robotics". Dieter is a Fellow of the IEEE, ACM, and AAAI, and recipient of the IEEE RAS Pioneer Award and the IJCAI John McCarthy Award.
Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on video, visual recognition, and action for perception or embodied AI. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, a Microsoft Research New Faculty Fellow, and a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. She was inducted into the UT Academy of Distinguished Teachers in 2017. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She served for six years as an Associate Editor-in-Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI) and for ten years as an Editorial Board member for the International Journal of Computer Vision (IJCV). She also served as a Program Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015 and a Program Chair of Neural Information Processing Systems (NeurIPS) 2018, and will serve as a Program Chair of the IEEE International Conference on Computer Vision (ICCV) 2023.
In association with the Embodied AI Workshop, Meta AI will present a demo of LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place in which a Boston Dynamics Spot will follow voice commands for object rearrangement such as "Find the plush in the table and place it in the case." The demo times for LSC include:
The Embodied AI 2023 workshop is hosting many exciting challenges covering a wide range of topics such as rearrangement, visual navigation, vision-and-language, and audio-visual navigation. More details regarding data, submission instructions, and timelines can be found on the individual challenge websites.
The workshop organizers are awarding each first-place challenge winner $300 dollars, sponsored by Apple, Hello Robot and Logical Robotics.
Challenge winners will be given the opportunity to present a talk at the workshop. Since many challenges can be grouped into similar tasks, we encourage participants to submit models to more than 1 challenge. The table below describes, compares, and links each challenge.
We invite high-quality 2-page extended abstracts on embodied AI, especially in areas relevant to the themes of this year's workshop:
The submission deadline is May 26th (Anywhere on Earth). Papers should be no longer than 2 pages (excluding references) and styled in the CVPR format.
Note. The order of the papers is randomized each time the page is refreshed.