Statistical nonparametric speech synthesis using sparse gaussian processes

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Research output: Contribution to journalConference articlepeer-review

8 Citations (Scopus)

Abstract

This paper proposes a statistical nonparametric speech synthe- sis technique based on a sparse Gaussian process regression (GPR). In our previous study, we proposed GPR-based speech synthesis where each frame of synthesis units is modeled by a regression of Gaussian processes. Preliminary experiments of synthesizing several phones including both vowels and conso- nants showed a potential of the technique. In this paper, the previous work is extended to full-sentence speech synthesis us- ing sparse GPs and context modification. Specifically, cluster- based sparse Gaussian processes such as local GPs and partially independent conditional (PIC) approximation are examined as a computationally feasible approach. Moreover, frame-level con- Text is extended to include not only a position context from a current phone but also adjacent phones to generate smoothly changing speech parameters. Objective and subjective evalua- Tion results show that the proposed technique outperforms the HMM-based speech synthesis with minimum generation error training.

Original languageEnglish
Pages (from-to)1072-1076
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2013
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: 2013 Aug 252013 Aug 29

Keywords

  • Gaussian process re- gression
  • Non-parametric bayesian model
  • Statistical speech synthesis

Fingerprint

Dive into the research topics of 'Statistical nonparametric speech synthesis using sparse gaussian processes'. Together they form a unique fingerprint.

Cite this