Automatic generation of synthesis units by unit selection based on closed-loop training

Takehiko Kagoshima, Masami Akamine

Research output: Contribution to journalArticlepeer-review

Abstract

This paper proposes a method that automatically generates synthesis units from a speech corpus, so that the distortion due to the modifications of pitch period and duration in the synthesized speech is minimized. In the proposed method, a large number of speech segments extracted from the speech corpus are defined as candidates of the synthesis unit. The distortion in the synthetic speech is calculated by modifying the pitch period and duration of candidates and comparing them to natural speech. The speech segment which minimizes the sum of distortions of synthetic speech with various pitch patterns is selected and defined as the synthesis unit. The proposed method is called the closed-loop training method, since the distortion in the synthesized speech is evaluated, and the result is fed back to select the synthesis unit. The generation of the synthesis unit by closed-loop training can be applied to various synthesizers, regardless of the kind of synthesis unit or the scheme of the synthesizer. An experiment was performed using the proposed method, where the synthesis units were generated by the synthesizer based on PSOLA and it was shown that the quality of the synthesized speech was improved compared to the conventional method where distortion by prosodic modification is not considered.

Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalSystems and Computers in Japan
Volume30
Issue number9
DOIs
Publication statusPublished - 1999 Aug

Fingerprint

Dive into the research topics of 'Automatic generation of synthesis units by unit selection based on closed-loop training'. Together they form a unique fingerprint.

Cite this