Abstract
This paper describes spontaneous dialogue speech synthe- sis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic in- formation of synthesized speech with a dimensional representa- Tion. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused- sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech sam- ples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthe- sized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R ≃ 0:7) wasobserved between given and perceived paralinguistic infor- mation, which implies that the proposed method could success- fully reflect intended paralinguistic messages on the synthesized speech.
Original language | English |
---|---|
Pages (from-to) | 1549-1553 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2013 Jan 1 |
Externally published | Yes |
Event | 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France Duration: 2013 Aug 25 → 2013 Aug 29 |
Keywords
- Hmm-based speech synthesis
- MAP estimation
- MRHSMM
- Paralinguistic information
- Spontaneous speech
- UU database
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation