A study on tailor-made speech synthesis based on deep neural networks

Shuhei Yamada, Takashi Nose, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

We propose “tailor-made speech synthesis,” the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. As a first step to realizing tailor-made speech synthesis, we introduce F0 context into speaker model training of speech synthesis based on deep neural networks (DNNs). F0 context represents relative log F0 at the mora or the accent-phrase level of training data. It allows users to control the F0 of synthetic speech steplessly on the contrary to conventional F0 context in HMM-based technique. Experiments showed that F0 context was effective to control the F0 because the F0 of synthetic voice followed the value of F0 context.

Original languageEnglish
Title of host publicationAdvances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the 12th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2016
EditorsHsiang-Cheh Huang, Jeng-Shyang Pan, Pei-Wei Tsai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages159-166
Number of pages8
ISBN (Print)9783319502083
DOIs
Publication statusPublished - 2017
Event12th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2016 - Kaohsiung, Taiwan, Province of China
Duration: 2016 Nov 212016 Nov 23

Publication series

NameSmart Innovation, Systems and Technologies
Volume63
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026

Conference

Conference12th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2016
Country/TerritoryTaiwan, Province of China
CityKaohsiung
Period16/11/2116/11/23

Keywords

  • Context label
  • DNN-based speech synthesis
  • F0 context
  • Model training
  • Prosody control
  • Unsupervised labeling

Fingerprint

Dive into the research topics of 'A study on tailor-made speech synthesis based on deep neural networks'. Together they form a unique fingerprint.

Cite this