TY - GEN
T1 - Speaker-independent style conversion for HMM-based expressive speech synthesis
AU - Kanagawa, Hiroki
AU - Nose, Takashi
AU - Kobayashi, Takao
PY - 2013/10/18
Y1 - 2013/10/18
N2 - This paper proposes a technique for creating target speaker's expressive-style model from the target speaker's neutral style speech in HMM-based speech synthesis. The technique is based on the style adaptation using linear transforms where speaker-independent transformation matrices are estimated in advance using pairs of neutraland target-style speech data of multiple speakers. By applying the obtained transformation matrices to a new speaker's neutral-style model, we can convert the style expressivity of the acoustic model to the target style without preparing any target-style speech of the speaker. In addition, we introduce a speaker adaptive training (SAT) framework into the transform estimation to reduce the acoustic difference among speakers. We subjectively evaluate the performance of the style conversion in terms of the naturalness, speaker similarity, and style reproducibility.
AB - This paper proposes a technique for creating target speaker's expressive-style model from the target speaker's neutral style speech in HMM-based speech synthesis. The technique is based on the style adaptation using linear transforms where speaker-independent transformation matrices are estimated in advance using pairs of neutraland target-style speech data of multiple speakers. By applying the obtained transformation matrices to a new speaker's neutral-style model, we can convert the style expressivity of the acoustic model to the target style without preparing any target-style speech of the speaker. In addition, we introduce a speaker adaptive training (SAT) framework into the transform estimation to reduce the acoustic difference among speakers. We subjectively evaluate the performance of the style conversion in terms of the naturalness, speaker similarity, and style reproducibility.
KW - HMM-based expressive speech synthesis
KW - linear transform
KW - speaker adaptive training
KW - style adaptation
KW - style conversion
UR - http://www.scopus.com/inward/record.url?scp=84890473200&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84890473200&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2013.6639195
DO - 10.1109/ICASSP.2013.6639195
M3 - Conference contribution
AN - SCOPUS:84890473200
SN - 9781479903566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7864
EP - 7868
BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Y2 - 26 May 2013 through 31 May 2013
ER -