Speaker-independent style conversion for HMM-based expressive speech synthesis

Hiroki Kanagawa, Takashi Nose, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

This paper proposes a technique for creating target speaker's expressive-style model from the target speaker's neutral style speech in HMM-based speech synthesis. The technique is based on the style adaptation using linear transforms where speaker-independent transformation matrices are estimated in advance using pairs of neutraland target-style speech data of multiple speakers. By applying the obtained transformation matrices to a new speaker's neutral-style model, we can convert the style expressivity of the acoustic model to the target style without preparing any target-style speech of the speaker. In addition, we introduce a speaker adaptive training (SAT) framework into the transform estimation to reduce the acoustic difference among speakers. We subjectively evaluate the performance of the style conversion in terms of the naturalness, speaker similarity, and style reproducibility.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages7864-7868
Number of pages5
DOIs
Publication statusPublished - 2013 Oct 18
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 2013 May 262013 May 31

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period13/5/2613/5/31

Keywords

  • HMM-based expressive speech synthesis
  • linear transform
  • speaker adaptive training
  • style adaptation
  • style conversion

Fingerprint

Dive into the research topics of 'Speaker-independent style conversion for HMM-based expressive speech synthesis'. Together they form a unique fingerprint.

Cite this