TY - JOUR
T1 - Speech recognition under multiple noise environment based on multi-mixture HMM and weight optimization by the aspect model
AU - Hahm, Seong Jun
AU - Ohkawa, Yuichi
AU - Ito, Masashi
AU - Suzuki, Motoyuki
AU - Ito, Akinori
AU - Makino, Shozo
PY - 2010/9
Y1 - 2010/9
N2 - In this paper, we propose an acoustic model that is robust to multiple noise environments, as well as a method for adapting the acoustic model to an environment to improve the model. The model is called "the multi-mixture model," which is based on a mixture of different HMMs each of which is trained using speech under different noise conditions. Speech recognition experiments showed that the proposed model performs better than the conventional multi-condition model. The method for adaptation is based on the aspect model, which is a "mixture-of-mixture" model. To realize adaptation using extremely small amount of adaptation data (i.e., a few seconds), we train a small number of mixture models, which can be interpreted as models for "clusters" of noise environments. Then, the models are mixed using weights, which are determined according to the adaptation data. The experimental results showed that the adaptation based on the aspect model improved the word accuracy in a heavy noise environment and showed no performance deterioration for all noise conditions, while the conventional methods either did not improve the performance or showed both improvement and degradation of recognition performance according to noise conditions.
AB - In this paper, we propose an acoustic model that is robust to multiple noise environments, as well as a method for adapting the acoustic model to an environment to improve the model. The model is called "the multi-mixture model," which is based on a mixture of different HMMs each of which is trained using speech under different noise conditions. Speech recognition experiments showed that the proposed model performs better than the conventional multi-condition model. The method for adaptation is based on the aspect model, which is a "mixture-of-mixture" model. To realize adaptation using extremely small amount of adaptation data (i.e., a few seconds), we train a small number of mixture models, which can be interpreted as models for "clusters" of noise environments. Then, the models are mixed using weights, which are determined according to the adaptation data. The experimental results showed that the adaptation based on the aspect model improved the word accuracy in a heavy noise environment and showed no performance deterioration for all noise conditions, while the conventional methods either did not improve the performance or showed both improvement and degradation of recognition performance according to noise conditions.
KW - Aspect model
KW - Multi-mixture HMM
KW - Noise-independent acoustic model
KW - Speech recognition in noisy environment
UR - http://www.scopus.com/inward/record.url?scp=77956847273&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77956847273&partnerID=8YFLogxK
U2 - 10.1587/transinf.E93.D.2407
DO - 10.1587/transinf.E93.D.2407
M3 - Article
AN - SCOPUS:77956847273
SN - 0916-8532
VL - E93-D
SP - 2407
EP - 2416
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 9
ER -