Speech-based computer-assisted language learning (CALL) systems should recognize the utterances of the learner with high accuracy and evaluate the language proficiency of the specific speaker with appropriate methods. In this paper, we discuss the automatic assessment of the second language (L2) for non-native speakers. There are many existing works on pronunciation evaluation by applying the goodness of pronunciation (GOP) method. This paper introduces an automatic proficiency evaluation system that combines various kinds of non-native acoustic models and native ones, such as Gaussian mixture model (GMM)-hidden Markov model (HMM) and deep neural network (DNN)-HMM. Most of existing works assume that we know the transcription of an utterance (the reference sentence) when evaluating the utterance, especially in reading and repeating tasks. To realize a reference-free proficiency evaluation, we propose a novel machine score named as the reference-free error rate (RER) to evaluate English proficiency. In our experiments, the DNN-based non-native acoustic models outperformed the traditional acoustic models on non-native speech recognition. Thus, we calculated the RER by regarding the recognition result from the DNN-based non-native acoustic model as “reference” and the result from the native acoustic model as “recognition result”. The proposed RER has high correlation with human proficiency scores, which indicates the effectiveness of RER for automatically estimating the proficiency. By combining the RER with other machine scores such as the log-likelihood scores, we obtained high correlation (reading aloud task: [Formula presented]; constrained interactive dialogue task: [Formula presented]; spontaneous English conversation task: [Formula presented]) to the human scores.
- Acoustic models
- Automatic proficiency assessment
- Computer-assisted language learning (CALL)
- Deep neural network (DNN)
- Japanese learners
- Non-native speech
- Speech recognition