HMM-based expressive speech synthesis based on phrase-level F0 context labeling

Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

研究成果: 書籍の章/レポート/Proceedings会議への寄与査読

3 被引用数 (Scopus)

抄録

This paper proposes a technique for adding more prosodic variations to the synthetic speech in HMM-based expressive speech synthesis. We create novel phrase-level F0 context labels from the residual information of F0 features between original and synthetic speech for the training data. Specifically, we classify the difference of average log F0 values between the original and synthetic speech into three classes which have perceptual meanings, i.e., high, neutral, and low of relative pitch at the phrase level. We evaluate both ideal and practical cases using appealing and fairy tale speech recorded under a realistic condition. In the ideal case, we examine the potential of our technique to modify the F0 patterns under a condition where the original F0 contours of test sentences are known. In the practical case, we show how the users intuitively modify the pitch by changing the initial F0 context labels obtained from the input text.

本文言語英語
ホスト出版物のタイトル2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
ページ7859-7863
ページ数5
DOI
出版ステータス出版済み - 2013 10月 18
イベント2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, カナダ
継続期間: 2013 5月 262013 5月 31

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

会議

会議2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
国/地域カナダ
CityVancouver, BC
Period13/5/2613/5/31

フィンガープリント

「HMM-based expressive speech synthesis based on phrase-level F0 context labeling」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル