Abstract
This paper proposes a novel transform mapping technique based on shared decision tree context clustering (STC) for HMM- based cross-lingual speech synthesis. In the conventional cross- lingual speaker adaptation based on state mapping, the adapta- Tion performance is not always satisfactory when there are mis- matches of languages and speakers between the average voice models of input and output languages. In the proposed tech- nique, we alleviate the effect of the mismatches on the trans- form mapping by introducing a language-independent decision tree constructed by STC, and represent the average voice mod- els using language-independent and dependent tree structures. We also use a bilingual speech corpus for keeping speaker char- Acteristics between the average voice models of different lan- guages. The experimental results show that the proposed tech- nique decreases both spectral and prosodic distortions between original and generated parameter trajectories and significantly improves the naturalness of synthetic speech while keeping the speaker similarity compared to the state mapping.
Original language | English |
---|---|
Pages (from-to) | 770-774 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2014 |
Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: 2014 Sept 14 → 2014 Sept 18 |
Keywords
- Bilingual speech corpus
- Cross-lingual TTS
- HMM-based speech synthesis
- Shared decision tree context clustering