TY - JOUR
T1 - A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis
AU - Chunwijitra, Vataya
AU - Nose, Takashi
AU - Kobayashi, Takao
N1 - Funding Information:
This work was supported in part by the JSPS Grant-in-Aid for Scientific Research 21300063 . The first author was supported by a Science and Technology Scholarship from the Thai government. We would like to thank NECTEC, Thailand, for providing us with the LOTUS and the TSynC-1 speech corpora.
PY - 2012/2
Y1 - 2012/2
N2 - This paper proposes a technique of improving tone correctness in speech synthesis of a tonal language based on an average-voice model trained with a corpus from nonprofessional speakers' speech. We focused on reducing tone disagreements in speech data acquired from nonprofessional speakers without manually modifying the labels. To reduce the distortion in tone caused by inconsistent tonal labeling, quantized F0 symbols were utilized as the context for F0 to obtain an appropriate F0 model. With this technique, the tonal context could be directly extracted from the original speech and this prevented inconsistency between speech data and F0 labels generated from transcriptions, which affect naturalness and the tone correctness in synthetic speech. We examined two types of labeling for the tonal context using phone-based and sub-phone-based quantized F0 symbols. Subjective and objective evaluations of the synthetic voice were carried out in terms of the intelligibility of tone and its naturalness. The experimental results from both the objective and subjective tests revealed that the proposed technique could improve not only naturalness but also the tone correctness of synthetic speech under conditions where a small amount of speech data from nonprofessional target speakers was used.
AB - This paper proposes a technique of improving tone correctness in speech synthesis of a tonal language based on an average-voice model trained with a corpus from nonprofessional speakers' speech. We focused on reducing tone disagreements in speech data acquired from nonprofessional speakers without manually modifying the labels. To reduce the distortion in tone caused by inconsistent tonal labeling, quantized F0 symbols were utilized as the context for F0 to obtain an appropriate F0 model. With this technique, the tonal context could be directly extracted from the original speech and this prevented inconsistency between speech data and F0 labels generated from transcriptions, which affect naturalness and the tone correctness in synthetic speech. We examined two types of labeling for the tonal context using phone-based and sub-phone-based quantized F0 symbols. Subjective and objective evaluations of the synthetic voice were carried out in terms of the intelligibility of tone and its naturalness. The experimental results from both the objective and subjective tests revealed that the proposed technique could improve not only naturalness but also the tone correctness of synthetic speech under conditions where a small amount of speech data from nonprofessional target speakers was used.
KW - Average voice model
KW - F0 modeling
KW - F0 quantization
KW - HMM-based speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=80055064844&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80055064844&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2011.08.006
DO - 10.1016/j.specom.2011.08.006
M3 - Article
AN - SCOPUS:80055064844
SN - 0167-6393
VL - 54
SP - 245
EP - 255
JO - Speech Communication
JF - Speech Communication
IS - 2
ER -