TY - JOUR
T1 - Improving human scoring of prosody using parametric speech synthesis
AU - Prafianto, Hafiyan
AU - Nose, Takashi
AU - Chiba, Yuya
AU - Ito, Akinori
N1 - Funding Information:
Part of this work is supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (B) Grant Number JP16K13253 .
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/8
Y1 - 2019/8
N2 - This paper proposes a method that utilizes parametric speech synthesis to improve human scoring of non-native speaker utterances. Instead of assessing each prosodic feature by directly listening to the utterance itself, in order to focus only on the target prosodic feature, the unassessed features are substituted with those of the native speakers. We used parametric speech synthesis to generate the features for substitution. In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional scoring method produced a score of 0.74.
AB - This paper proposes a method that utilizes parametric speech synthesis to improve human scoring of non-native speaker utterances. Instead of assessing each prosodic feature by directly listening to the utterance itself, in order to focus only on the target prosodic feature, the unassessed features are substituted with those of the native speakers. We used parametric speech synthesis to generate the features for substitution. In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional scoring method produced a score of 0.74.
KW - Automatic pronunciation evaluation system
KW - Average voice model
KW - Computer assisted language learning (CALL)
KW - Computer assisted pronunciation training (CAPT)
KW - Parametric speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=85066937695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066937695&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2019.06.001
DO - 10.1016/j.specom.2019.06.001
M3 - Article
AN - SCOPUS:85066937695
SN - 0167-6393
VL - 111
SP - 14
EP - 21
JO - Speech Communication
JF - Speech Communication
ER -