抄録
This paper proposes a sentence selection method using a maxi- mum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech syn- thesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech syn- thesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the pho- netic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sen- tence length. The objective experimental results show that the proposed method achieves better coverage and balance of con- texts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.
本文言語 | English |
---|---|
ホスト出版物のタイトル | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
出版社 | International Speech and Communication Association |
ページ | 3491-3495 |
ページ数 | 5 |
巻 | 2015-January |
出版ステータス | Published - 2015 |
イベント | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany 継続期間: 2015 9月 6 → 2015 9月 10 |
Other
Other | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 |
---|---|
国/地域 | Germany |
City | Dresden |
Period | 15/9/6 → 15/9/10 |
ASJC Scopus subject areas
- 言語および言語学
- 人間とコンピュータの相互作用
- 信号処理
- ソフトウェア
- モデリングとシミュレーション