TY - GEN
T1 - HMM-based speech synthesis with unsupervised labeling of accentual context based on F0 quantization and average voice model
AU - Nose, Takashi
AU - Ooki, Koujirou
AU - Kobayashi, Takao
PY - 2010
Y1 - 2010
N2 - This paper proposes an HMM-based speech synthesis technique without any manual labeling of accent information for a target speaker's training data. To appropriately model the fundamental frequency (F0) feature of speech, the proposed technique uses coarsely quantized F0 symbols instead of accent types for the context-dependent labeling. By using F0 quantization, we can automatically conduct the labeling of F0 contexts for training data. When synthesizing speech, an average voice model trained in advance using manually labeled multiple speakers' speech data including accent information is used to create the label sequence for synthesis. Specifically, the input text is converted to a full context label sequence, and an F0 contour is generated from the label sequence and the average voice model. Then, a label sequence including the quantized F0 symbols is created from the generated F0 contour. We conduct objective and subjective evaluation tests, and discuss the results.
AB - This paper proposes an HMM-based speech synthesis technique without any manual labeling of accent information for a target speaker's training data. To appropriately model the fundamental frequency (F0) feature of speech, the proposed technique uses coarsely quantized F0 symbols instead of accent types for the context-dependent labeling. By using F0 quantization, we can automatically conduct the labeling of F0 contexts for training data. When synthesizing speech, an average voice model trained in advance using manually labeled multiple speakers' speech data including accent information is used to create the label sequence for synthesis. Specifically, the input text is converted to a full context label sequence, and an F0 contour is generated from the label sequence and the average voice model. Then, a label sequence including the quantized F0 symbols is created from the generated F0 contour. We conduct objective and subjective evaluation tests, and discuss the results.
KW - Average voice model
KW - F0 modeling
KW - F0 quantization
KW - HMM-based speech synthesis
KW - Unsupervised training
UR - http://www.scopus.com/inward/record.url?scp=78049364097&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049364097&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2010.5495548
DO - 10.1109/ICASSP.2010.5495548
M3 - Conference contribution
AN - SCOPUS:78049364097
SN - 9781424442966
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4622
EP - 4625
BT - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Y2 - 14 March 2010 through 19 March 2010
ER -