Abstract
Totally Speaker Driven Text to Speech System produces high quality and natural speech resembling the acoustic and prosodic characteristics of the original speech corpus. In the FO contour control of this system, an FO contour of a whole sentence is produced by concatenating segmental FO contours generated by modifying vectors that arc representatives of typical FO contours. The representative vectors arc selected from the FO contour codebook, which is designed so as to minimize the approximation error between FO contours generated by the proposed model and real FO contours extracted from a speech corpus. It was confirmed by experiments with Japanese speech corpus that FO contours can be modeled with small approximation errors by only 48 representative vectors, and the synthetic speech sounded very natural and resembled the prosodic characteristics of the original speaker.
Original language | English |
---|---|
Publication status | Published - 1998 |
Externally published | Yes |
Event | 5th International Conference on Spoken Language Processing, ICSLP 1998 - Sydney, Australia Duration: 1998 Nov 30 → 1998 Dec 4 |
Conference
Conference | 5th International Conference on Spoken Language Processing, ICSLP 1998 |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 98/11/30 → 98/12/4 |
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language