Statistical parametric speech synthesis based on gaussian process regression

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Research output: Contribution to journalArticlepeer-review

35 Citations (Scopus)

Abstract

This paper proposes a statistical parametric speech synthesis technique based on Gaussian process regression (GPR). The GPR model is designed for directly predicting frame-level acoustic features from corresponding information on frame context that is obtained from linguistic information. The frame context includes the relative position of the current frame within the phone and articulatory information and is used as the explanatory variable in GPR. Here, we introduce cluster-based sparse Gaussian processes (GPs), i.e., local GPs and partially independent conditional (PIC) approximation, to reduce the computational cost. The experimental results for both isolated phone synthesis and full-sentence continuous speech synthesis revealed that the proposed GPR-based technique without dynamic features slightly outperformed the conventional hidden Markov model (HMM)-based speech synthesis using minimum generation error training with dynamic features.

Original languageEnglish
Article number6609068
Pages (from-to)173-183
Number of pages11
JournalIEEE Journal on Selected Topics in Signal Processing
Volume8
Issue number2
DOIs
Publication statusPublished - 2014 Apr

Keywords

  • Gaussian process regression
  • nonparametric Bayesian model
  • partially independent conditional (PIC) approximation
  • sparse Gaussian processes
  • statistical speech synthesis

Fingerprint

Dive into the research topics of 'Statistical parametric speech synthesis based on gaussian process regression'. Together they form a unique fingerprint.

Cite this