A rapid model adaptation technique for emotional speech recognition with style estimation based on multiple-regression HMM

Yusuke Ijima, Takashi Nose, Makoto Tachibana, Takao Kobayashi

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by alow-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.

Original languageEnglish
Pages (from-to)107-115
Number of pages9
JournalIEICE Transactions on Information and Systems
VolumeE93-D
Issue number1
DOIs
Publication statusPublished - 2010
Externally publishedYes

Keywords

  • Emotional speech
  • Multipleregression HMM
  • Speaker adaptation
  • Speaking style
  • Style adaptation
  • Style estimation

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A rapid model adaptation technique for emotional speech recognition with style estimation based on multiple-regression HMM'. Together they form a unique fingerprint.

Cite this