Abstract
Topic-based stochastic models such as the probabilistic latent semantic analysis (PLSA) are good tools for adapting a language model into a specific domain using a constraint of global context. A probability given by a topic model is combined with an n-gram probability using the unigram rescaling scheme. One practical problem to apply PLSA to speech recognition is that calculation of probabilities using PLSA is computationally expensive, that prevents the topic-based language model from incorporating that model into decoding process. In this paper, we proposed an algorithm to calculate a back-off n-gram probability with unigram rescaling quickly, without any approximation. This algorithm reduces the calculation of a normalizing factor drastically, which only requires calculation of probabilities of words that appears in the current context. The experimental result showed that the proposed algorithm was more than 6000 times faster than the naive calculation method.
Original language | English |
---|---|
Journal | IAENG International Journal of Computer Science |
Volume | 36 |
Issue number | 4 |
Publication status | Published - 2009 Nov |
Keywords
- Back-off smoothing
- N-gram
- Probabilistic latent semantic analysis
- Unigram rescaling