TY - JOUR
T1 - A technique for estimating intensity of emotional expressions and speaking styles in speech based on multiple-regression HSMM
AU - Nose, Takashi
AU - Kobayashi, Takao
PY - 2010
Y1 - 2010
N2 - In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
AB - In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
KW - Emotion recognition
KW - Emotional expression
KW - Hidden semi-Markov model (HSMM)
KW - Intensity of style
KW - Multiple-regression HSMM (MRHSMM)
KW - Speaking style
UR - http://www.scopus.com/inward/record.url?scp=77950198774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77950198774&partnerID=8YFLogxK
U2 - 10.1587/transinf.E93.D.116
DO - 10.1587/transinf.E93.D.116
M3 - Article
AN - SCOPUS:77950198774
SN - 0916-8532
VL - E93-D
SP - 116
EP - 124
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 1
ER -