Tonal context labeling using quantized F0 symbols for improving tone correctness in average-voice-based speech synthesis

Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

This paper proposes a technique for improving tone correctness in Thai speech synthesis based on an average voice model trained with nonprofessional speech corpus. The proposed technique utilizes quantized F0 symbols as the tonal context in order to obtain an appropriate F0 model. With this technique, the prosodic context can be extracted from real speech directly and this leads to prevent the inconsistency between speech data and F0 labels generated from transcription, which affects the naturalness and tone correctness in synthetic speech. We examine two types of tonal context labeling using the quantized F0 symbols based on phone and sub-phone boundaries. Experimental results of both objective and subjective tests show that the proposed technique can improve not only the naturalness but also the tone correctness of synthetic speech under condition of using a small amount speech data of nonprofessional target speakers.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages4708-4711
Number of pages4
DOIs
Publication statusPublished - 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 2011 May 222011 May 27

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Country/TerritoryCzech Republic
CityPrague
Period11/5/2211/5/27

Keywords

  • average voice model
  • F0 modeling
  • F0 quantization
  • HMM-based speech synthesis

Fingerprint

Dive into the research topics of 'Tonal context labeling using quantized F0 symbols for improving tone correctness in average-voice-based speech synthesis'. Together they form a unique fingerprint.

Cite this