TY - GEN
T1 - Feature enhancement by speaker-normalized SPLICE for robust speech recognition
AU - Shinohara, Yusuke
AU - Masuko, Takashi
AU - Akamine, Masami
PY - 2008
Y1 - 2008
N2 - The SPLICE method of feature enhancement is known for its powerful performance. It learns a mapping from noisy to clean feature vectors given a set of stereo training data. However, feature vector variation caused by speaker changes conceals noise-induced variation, which is what we want to find in the SPLICE training. In this paper, an improvement of SPLICE by means of speaker-normalization is proposed. The training data is first normalized with respect to speaker variation, and a mapping is learned afterward. CMLLR with a GMM as its target is utilized for the speaker-normalization, where the GMM representing a standard speaker is learned via a novel variant of the speaker adaptive training. The proposed method was evaluated on Aurora2, and achieved a relative word error rate reduction of 38% over the conventional SPLICE.
AB - The SPLICE method of feature enhancement is known for its powerful performance. It learns a mapping from noisy to clean feature vectors given a set of stereo training data. However, feature vector variation caused by speaker changes conceals noise-induced variation, which is what we want to find in the SPLICE training. In this paper, an improvement of SPLICE by means of speaker-normalization is proposed. The training data is first normalized with respect to speaker variation, and a mapping is learned afterward. CMLLR with a GMM as its target is utilized for the speaker-normalization, where the GMM representing a standard speaker is learned via a novel variant of the speaker adaptive training. The proposed method was evaluated on Aurora2, and achieved a relative word error rate reduction of 38% over the conventional SPLICE.
KW - Feature enhancement
KW - Robust speech recognition
KW - Speaker adaptive training
KW - Speaker normalization
KW - SPLICE
UR - http://www.scopus.com/inward/record.url?scp=51449095024&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51449095024&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2008.4518751
DO - 10.1109/ICASSP.2008.4518751
M3 - Conference contribution
AN - SCOPUS:51449095024
SN - 1424414849
SN - 9781424414840
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4881
EP - 4884
BT - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
T2 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Y2 - 31 March 2008 through 4 April 2008
ER -