TY - JOUR
T1 - An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model
AU - Nose, Takashi
AU - Kobayashi, Takao
N1 - Funding Information:
Part of this work was supported by JSPS Grant-in-Aid for Scientific Research 23700195 and 24300071 .
PY - 2013/2
Y1 - 2013/2
N2 - To control intuitively the intensities of emotional expressions and speaking styles for synthetic speech, we introduce subjective style intensities and multiple-regression global variance (MRGV) models into hidden Markov model (HMM)-based expressive speech synthesis. A problem in the conventional parametric style modeling and style control techniques is that the intensities of styles appearing in synthetic speech strongly depend on the training data. To alleviate this problem, the proposed technique explicitly takes into account subjective style intensities perceived for respective training utterances using multiple-regression hidden semi-Markov models (MRHSMMs). As a result, synthetic speech becomes less sensitive to the variation of style expressivity existing in the training data. Another problem is that the synthetic speech generally suffers from the over-smoothing effect of model parameters in the model training, so the variance of the generated speech parameter trajectory becomes smaller than that of the natural speech. To alleviate this problem for the case of style control, we extend the conventional variance compensation method based on a GV model for a single-style speech to the case of multiple styles with variable style intensities by deriving the MRGV modeling. The objective and subjective experimental results show that these two techniques significantly enhance the intuitive style control of synthetic speech, which is essential for the speech synthesis system to communicate para-linguistic information correctly to the listeners.
AB - To control intuitively the intensities of emotional expressions and speaking styles for synthetic speech, we introduce subjective style intensities and multiple-regression global variance (MRGV) models into hidden Markov model (HMM)-based expressive speech synthesis. A problem in the conventional parametric style modeling and style control techniques is that the intensities of styles appearing in synthetic speech strongly depend on the training data. To alleviate this problem, the proposed technique explicitly takes into account subjective style intensities perceived for respective training utterances using multiple-regression hidden semi-Markov models (MRHSMMs). As a result, synthetic speech becomes less sensitive to the variation of style expressivity existing in the training data. Another problem is that the synthetic speech generally suffers from the over-smoothing effect of model parameters in the model training, so the variance of the generated speech parameter trajectory becomes smaller than that of the natural speech. To alleviate this problem for the case of style control, we extend the conventional variance compensation method based on a GV model for a single-style speech to the case of multiple styles with variable style intensities by deriving the MRGV modeling. The objective and subjective experimental results show that these two techniques significantly enhance the intuitive style control of synthetic speech, which is essential for the speech synthesis system to communicate para-linguistic information correctly to the listeners.
KW - HMM-based expressive speech synthesis
KW - Multiple-regression global variance model
KW - Multiple-regression HSMM
KW - Style control
KW - Style intensity
UR - http://www.scopus.com/inward/record.url?scp=84870246600&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870246600&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2012.09.003
DO - 10.1016/j.specom.2012.09.003
M3 - Article
AN - SCOPUS:84870246600
SN - 0167-6393
VL - 55
SP - 347
EP - 357
JO - Speech Communication
JF - Speech Communication
IS - 2
ER -