Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

研究成果: Conference contribution

6 被引用数 (Scopus)

抄録

This paper proposes a sentence selection method using a maxi- mum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech syn- thesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech syn- thesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the pho- netic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sen- tence length. The objective experimental results show that the proposed method achieves better coverage and balance of con- texts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.

本文言語English
ホスト出版物のタイトルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版社International Speech and Communication Association
ページ3491-3495
ページ数5
2015-January
出版ステータスPublished - 2015
イベント16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
継続期間: 2015 9月 62015 9月 10

Other

Other16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
国/地域Germany
CityDresden
Period15/9/615/9/10

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル