Parametric speech synthesis using local and global sparse Gaussian processes

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes an application of Gaussian process regression (GPR) to parametric speech synthesis. GPR enables us to predict synthetic speech parameters by utilizing exemplars of training speech data directly without converting the acoustic features of training data into too small number of model parameters thanks to nonparametric Bayesian regression. However, GPR inherently requires high computational cost and resources. In this paper, to alleviate this problem, we incorporate local and global sparse Gaussian process approximation into the statistical speech synthesis framework, and investigate trade-off between computational cost and speech synthesis performance through experiments. Moreover, we examine the way of choosing pseudo data set used for the sparse GP approximation.

Original languageEnglish
Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing, MLSP
EditorsTulay Adali, Jan Larsen, Mamadou Mboup, Eric Moreau
PublisherIEEE Computer Society
ISBN (Electronic)9781479936946
DOIs
Publication statusPublished - 2014 Nov 14
Event2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 - Reims, France
Duration: 2014 Sept 212014 Sept 24

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014
Country/TerritoryFrance
CityReims
Period14/9/2114/9/24

Keywords

  • Gaussian process regression
  • parametric speech synthesis
  • partially independent conditional (PIC) approximation

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing

Fingerprint

Dive into the research topics of 'Parametric speech synthesis using local and global sparse Gaussian processes'. Together they form a unique fingerprint.

Cite this