Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis

Tomohiro Nagata, Hiroki Mori, Takashi Nose

Research output: Contribution to journalConference articlepeer-review

2 Citations (Scopus)

Abstract

This paper describes spontaneous dialogue speech synthe- sis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic in- formation of synthesized speech with a dimensional representa- Tion. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused- sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech sam- ples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthe- sized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R ≃ 0:7) wasobserved between given and perceived paralinguistic infor- mation, which implies that the proposed method could success- fully reflect intended paralinguistic messages on the synthesized speech.

Original languageEnglish
Pages (from-to)1549-1553
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2013 Jan 1
Externally publishedYes
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: 2013 Aug 252013 Aug 29

Keywords

  • Hmm-based speech synthesis
  • MAP estimation
  • MRHSMM
  • Paralinguistic information
  • Spontaneous speech
  • UU database

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis'. Together they form a unique fingerprint.

Cite this