Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

There is a growing interest in the community in making an embodied AI agent perform a complicated task while interacting with an environment following natural language directives. Recent studies have tackled the problem using ALFRED, a well-designed dataset for the task, but achieved only very low accuracy. This paper proposes a new method, which outperforms the previous methods by a large margin. It is based on a combination of several new ideas. One is a two-stage interpretation of the provided instructions. The method first selects and interprets an instruction without using visual information, yielding a tentative action sequence prediction. It then integrates the prediction with the visual information etc., yielding the final prediction of an action and an object. As the object's class to interact is identified in the first stage, it can accurately select the correct object from the input image. Moreover, our method considers multiple egocentric views of the environment and extracts essential information by applying hierarchical attention conditioned on the current instruction. This contributes to the accurate prediction of actions for navigation. A preliminary version of the method won the ALFRED Challenge 2020. The current version achieves the unseen environment's success rate of 4.45% with a single view, which is further improved to 8.37% with multiple views.

Original languageEnglish
Title of host publicationProceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
EditorsZhi-Hua Zhou
PublisherInternational Joint Conferences on Artificial Intelligence
Pages923-930
Number of pages8
ISBN (Electronic)9780999241196
Publication statusPublished - 2021
Event30th International Joint Conference on Artificial Intelligence, IJCAI 2021 - Virtual, Online, Canada
Duration: 2021 Aug 192021 Aug 27

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference30th International Joint Conference on Artificial Intelligence, IJCAI 2021
Country/TerritoryCanada
CityVirtual, Online
Period21/8/1921/8/27

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Look Wide and Interpret Twice: Improving Performance on Interactive Instruction-following Tasks'. Together they form a unique fingerprint.

Cite this