Estimating the user's state before exchanging utterances using intermediate acoustic features for spoken dialog systems

Yuya Chiba, Takashi Nose, Masashi Ito, Akinori Ito

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


The spoken dialog system (SDS) is an example of a speech interface and has been included in several devices to help users operate the system. The SDS is beneficial for the user because it does not restrict the style of the user's input utterances, but sometimes makes it difficult to speak to the system. Conventional systems cannot give appropriate help to a user who does not make explicit input utterances since these systems have to recognize and parse a user's utterance in order to decide the next prompt. Therefore, the system should estimate the state of the user upon encountering a problem in order to start the dialog and provide appropriate help before the user abandons the dialog. Based on this assumption, we aim to construct a system which responds to a user who does not speak to the system. In this research, we defined two basic states of the user when the user does not speak to the system: the user is embarrassed by the prompt, or is thinking about how to answer the prompt. We discriminated these user states by using intermediate acoustic features and the facial orientation of the user. Our previous approach used several intermediate acoustic features determined manually, and it was not possible to discriminate the user's state automatically. Therefore, the present paper examines a method to extract intermediate acoustic features from low-level features, such as MFCC, log F0, and zero cross counting (ZCC). We introduce a new annotation rule, and compare the discrimination performance with the previous feature set. Finally, the user's state was discriminated by using the combination of intermediate acoustic features and facial orientation.

Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalIAENG International Journal of Computer Science
Issue number1
Publication statusPublished - 2016 Feb 1


  • Multi-modal information
  • Spoken dialog system
  • User's state


Dive into the research topics of 'Estimating the user's state before exchanging utterances using intermediate acoustic features for spoken dialog systems'. Together they form a unique fingerprint.

Cite this