Unit selection speech synthesis using multiple speech units at non-adjacent segments for prosody and waveform generation

Masatsune Tamura, Norbert Braunschweiler, Takehiko Kagoshima, Masami Akamine

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

In this paper, we propose a speech synthesis method that combines a natural waveform concatenation based speech synthesis method and our baseline plural unit selection and fusion method. Two main features of the proposed method are (i) prosody regeneration from selected speech units and (ii) using multiple speech units at non-adjacent segments. The non-adjacent segments is the segment that the previous or following speech units in the optimum speech unit sequence are not adjacent in the database. By using the prosody of selected speech units, the original prosodic expressions and sounds of recorded speech are retained, while discontinuities are reduced by using multiple speech units at non-adjacent segments. MOS evaluations showed that the proposed method provides a clear improvement against the conventional unit selection method and our baseline method.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4802-4805
Number of pages4
ISBN (Print)9781424442966
DOIs
Publication statusPublished - 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 2010 Mar 142010 Mar 19

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/TerritoryUnited States
CityDallas, TX
Period10/3/1410/3/19

Keywords

  • Concatenative speech synthesis
  • Prosody generation
  • Unit fusion
  • Unit selection

Fingerprint

Dive into the research topics of 'Unit selection speech synthesis using multiple speech units at non-adjacent segments for prosody and waveform generation'. Together they form a unique fingerprint.

Cite this