Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis

Makoto Tachibana, Shinsuke Izawa, Takashi Nose, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Citations (Scopus)

Abstract

We propose a technique for synthesizing speech with desired style expressivity of an arbitrary target speaker's voice. In an MLLR-based speaker adaptation technique for multiple regression hidden semi-Markov model (MRHSMM), the quality of synthesized speech crucially depends on the initial MRHSMM trained from a certain source speaker's data and it is not always possible to synthesize natural sounding speech with a given target speaker's voice. To overcome this problem, we perform simultaneous adaptation of speaker and style from an average voice model. Experimental results show that the proposed technique provides more natural sounding speech than the conventional one with speaker adaptation only.

Original languageEnglish
Title of host publication2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Pages4633-4636
Number of pages4
DOIs
Publication statusPublished - 2008
Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
Duration: 2008 Mar 312008 Apr 4

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Country/TerritoryUnited States
CityLas Vegas, NV
Period08/3/3108/4/4

Keywords

  • Average voice model
  • Expressive speech synthesis
  • Hidden Markov model
  • Speaker adaptation
  • Style control

Fingerprint

Dive into the research topics of 'Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis'. Together they form a unique fingerprint.

Cite this