TY - GEN
T1 - Aspect-model-based reference speaker weighting
AU - Hahm, Seongjun
AU - Ohkawa, Yuichi
AU - Ito, Masashi
AU - Suzuki, Motoyuki
AU - Ito, Akinori
AU - Makino, Shozo
PY - 2010
Y1 - 2010
N2 - We propose an aspect-model-based reference speaker weighting. The main idea of the approach is that the adapted model is a linear combination of a set of reference speakers like reference speaker weighting (RSW) and eigenvoices. The aspect model is the mixture model of speaker-dependent (SD) models. In this paper, aspect model weighting (AMW) is proposed for finding an optimal weighting of a set of reference speakers unlike RSW and the aspect model which is a kind of cluster models is trained based on likelihood maximization with respect to the training data. The number of adaptation parameters can also be reduced using aspect model approach. For evaluation, we carried out an isolated word recognition experiment on Korean database (KLE452). The results were compared to those of conventional MAP, MLLR, RSW, and eigenvoice. Even though we use only 0.5s of adaptation data, 27.24% relative error rate reduction in comparison with speaker-independent (SI) baseline performance was achieved.
AB - We propose an aspect-model-based reference speaker weighting. The main idea of the approach is that the adapted model is a linear combination of a set of reference speakers like reference speaker weighting (RSW) and eigenvoices. The aspect model is the mixture model of speaker-dependent (SD) models. In this paper, aspect model weighting (AMW) is proposed for finding an optimal weighting of a set of reference speakers unlike RSW and the aspect model which is a kind of cluster models is trained based on likelihood maximization with respect to the training data. The number of adaptation parameters can also be reduced using aspect model approach. For evaluation, we carried out an isolated word recognition experiment on Korean database (KLE452). The results were compared to those of conventional MAP, MLLR, RSW, and eigenvoice. Even though we use only 0.5s of adaptation data, 27.24% relative error rate reduction in comparison with speaker-independent (SI) baseline performance was achieved.
KW - Aspect model weighting
KW - Reference speaker weighting
KW - Speaker adaptation
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=78049413907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049413907&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2010.5495672
DO - 10.1109/ICASSP.2010.5495672
M3 - Conference contribution
AN - SCOPUS:78049413907
SN - 9781424442966
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4302
EP - 4305
BT - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Y2 - 14 March 2010 through 19 March 2010
ER -