TY - GEN
T1 - An HMM-based segment quantizer and its application to low bit rate speech coding
AU - Suzuki, Motoyuki
AU - Adachi, Masashi
AU - Kohata, Minoru
AU - Ito, Akinori
AU - Makino, Shozo
AU - Ren, Fuji
PY - 2010
Y1 - 2010
N2 - Several speech coding systems employ a segment quantizer instead of a vector quantizer. One of the most important problems is how to construct a segment codebook. In this paper, a new speech coder based on the ML-BEATS is proposed. The ML-BEATS is one of the HMM-based segment quantizer. First, it splits a vector sequence into several sub-sequences, and then these sub-sequences are clustered in order to construct a codebook. Each cluster center is represented by a left-to-right HMM. In the encoding process, input speech is matched with HMMs in the codebook, and then HMM index and duration information are sent to the decoder. In the decoding process, a decoded sequence is generated from HMM parameters by applying the HMM-based speech synthesis method. From the experimental results, the HMM-based speech coder gave 1.13 dB spectral distortion with 5.83 bit/frame. It is 0.11 dB higher spectral distortion than that given by G.729 coder, but bit rate decreased only 32%. In order to consider a shifting problem of LSP dimensions, we also propose a new codebook construction method. Many training vectors are extracted from training samples by shifting dimensions, and all vectors are used for constructing a universal codebook. The universal codebook can deal with any shifted vectors because all possibilities are included in the training data. From the experimental results, the shifted vector method encoded an input speech with very low bit rate, but it gave higher spectral distortions.
AB - Several speech coding systems employ a segment quantizer instead of a vector quantizer. One of the most important problems is how to construct a segment codebook. In this paper, a new speech coder based on the ML-BEATS is proposed. The ML-BEATS is one of the HMM-based segment quantizer. First, it splits a vector sequence into several sub-sequences, and then these sub-sequences are clustered in order to construct a codebook. Each cluster center is represented by a left-to-right HMM. In the encoding process, input speech is matched with HMMs in the codebook, and then HMM index and duration information are sent to the decoder. In the decoding process, a decoded sequence is generated from HMM parameters by applying the HMM-based speech synthesis method. From the experimental results, the HMM-based speech coder gave 1.13 dB spectral distortion with 5.83 bit/frame. It is 0.11 dB higher spectral distortion than that given by G.729 coder, but bit rate decreased only 32%. In order to consider a shifting problem of LSP dimensions, we also propose a new codebook construction method. Many training vectors are extracted from training samples by shifting dimensions, and all vectors are used for constructing a universal codebook. The universal codebook can deal with any shifted vectors because all possibilities are included in the training data. From the experimental results, the shifted vector method encoded an input speech with very low bit rate, but it gave higher spectral distortions.
UR - http://www.scopus.com/inward/record.url?scp=84869134650&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869134650&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84869134650
SN - 9781617827457
T3 - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
SP - 3877
EP - 3880
BT - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
T2 - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society
Y2 - 23 August 2010 through 27 August 2010
ER -