Multimodal Dialogue Response Timing Estimation Using Dialogue Context Encoder

Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Spoken dialogue systems need to determine when to respond to a user in addition to the response. Various cues, such as prosody, gaze, and facial expression are known to affect response timing. Recent studies have revealed that using the representation of a system response improves the performance of response timing prediction. However, it is difficult to directly use a future response with dialogue systems that require an entire user utterance to generate a response. This study proposes a neural-based response timing estimation model using past utterances to alleviate this problem. The proposed model is expected to consider the intention of the system response implicitly.

Original languageEnglish
Title of host publicationConversational AI for Natural Human-Centric Interaction - 12th International Workshop on Spoken Dialogue System Technology, IWSDS 2021
EditorsSvetlana Stoyanchev, Stefan Ultes, Haizhou Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages133-141
Number of pages9
ISBN (Print)9789811955372
DOIs
Publication statusPublished - 2022
Event12th International Workshop on Spoken Dialogue System Technology, IWSDS 2021 - Virtual, Online
Duration: 2021 Nov 152021 Nov 17

Publication series

NameLecture Notes in Electrical Engineering
Volume943
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference12th International Workshop on Spoken Dialogue System Technology, IWSDS 2021
CityVirtual, Online
Period21/11/1521/11/17

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Fingerprint

Dive into the research topics of 'Multimodal Dialogue Response Timing Estimation Using Dialogue Context Encoder'. Together they form a unique fingerprint.

Cite this