This paper describes a new powerful statistical language model based on N-gram model for Japanese speech recognition. In English, a sentence is written word-by-word. On the other hand, a sentence in Japanese has no word boundary character. Therefore, a Japanese sentence requires word segmentation by morphemic analysis before the construction of word N-gram. We propose an N-gram based language model which requires no word segmentation. This model uses character string patterns as units of N-gram. The string patterns are chosen from the training text according to a statistical criterion. We carried out several experiments to compare perplexities of the proposed and the conventional models, which showed the advantage of our model. For many of the readers' interest, we applied this method to English text. As the result of a preliminary experiment, the proposed method got better performance than conventional word trigram.
|Number of pages||4|
|Publication status||Published - 1996|
|Event||Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA|
Duration: 1996 Oct 3 → 1996 Oct 6
|Conference||Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4)|
|City||Philadelphia, PA, USA|
|Period||96/10/3 → 96/10/6|