Usages of an external duration model for HMM-based speech synthesis

Javier Latorre, Sabine Buchholz, Masami Akamine

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

In this paper we analyze three different approaches to improving the quality of an HMM-based speech synthesizer by means of an external duration model. The first approach uses the external duration model in a standard way to define the phone duration during synthesis. The second is a novel approach that uses the phone duration to create additional context features for the decision trees clustering. The third is a combination of the previous two approaches. A subjective evaluation showed a quality improvement with respect to the baseline for all three approaches, although for differing reasons. The standard approach produces an improvement in the duration estimation. The second approach degrades the duration estimation but improves the logF0 and aperiodicity by better modeling of their dependencies with respect to the duration. Finally, the combined approach benefits from the improvements of the other two and yields the best result of ca. 16% higher preference than the baseline among native English speakers.

Original languageEnglish
Title of host publication5th International Conference on Speech Prosody 2010
PublisherInternational Speech Communications Association
ISBN (Electronic)9780000000002
Publication statusPublished - 2010
Event5th International Conference on Speech Prosody: Every Language, Every Style, SP 2010 - Chicago, United States
Duration: 2010 May 102010 May 14

Publication series

NameProceedings of the International Conference on Speech Prosody
ISSN (Print)2333-2042

Conference

Conference5th International Conference on Speech Prosody: Every Language, Every Style, SP 2010
Country/TerritoryUnited States
CityChicago
Period10/5/1010/5/14

Keywords

  • Duration
  • External duration model
  • HMM-based
  • Prosody
  • Speech synthesis

Fingerprint

Dive into the research topics of 'Usages of an external duration model for HMM-based speech synthesis'. Together they form a unique fingerprint.

Cite this