Integration of accent sandhi and prosodic features estimation for japanese text-to-speech synthesis

Daisuke Fujimaki, Takashi Nose, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, Japanese text-to-speech (TTS) synthesis methods have been actively researched. We need to estimate appropriate prosodic information for generating a high-quality synthetic speech. However, manual annotation is costly, and automatic annotation introduces estimation errors. This paper examines the integration of accent sandhi and prosodic feature estimation in the acoustic modeling for Japanese TTS to overcome the problems. The proposed method achieves total optimization of the F0 model by using the linguistic features from a dictionary. Objective and subjective evaluations confirmed that the cost of creating accent labels was reduced, and the accuracy of the prosodic feature estimation was improved.

Original languageEnglish
Title of host publication2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages358-359
Number of pages2
ISBN (Electronic)9781728198026
DOIs
Publication statusPublished - 2020 Oct 13
Event9th IEEE Global Conference on Consumer Electronics, GCCE 2020 - Kobe, Japan
Duration: 2020 Oct 132020 Oct 16

Publication series

Name2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020

Conference

Conference9th IEEE Global Conference on Consumer Electronics, GCCE 2020
Country/TerritoryJapan
CityKobe
Period20/10/1320/10/16

Keywords

  • accent sandhi
  • Japanese text-to-speech
  • speech synthesis

Fingerprint

Dive into the research topics of 'Integration of accent sandhi and prosodic features estimation for japanese text-to-speech synthesis'. Together they form a unique fingerprint.

Cite this