Language modeling by string pattern N-gram for Japanese speech recognition

Akinori Ito, Masaki Kohda

Research output: Contribution to conferencePaperpeer-review

4 Citations (Scopus)

Abstract

This paper describes a new powerful statistical language model based on N-gram model for Japanese speech recognition. In English, a sentence is written word-by-word. On the other hand, a sentence in Japanese has no word boundary character. Therefore, a Japanese sentence requires word segmentation by morphemic analysis before the construction of word N-gram. We propose an N-gram based language model which requires no word segmentation. This model uses character string patterns as units of N-gram. The string patterns are chosen from the training text according to a statistical criterion. We carried out several experiments to compare perplexities of the proposed and the conventional models, which showed the advantage of our model. For many of the readers' interest, we applied this method to English text. As the result of a preliminary experiment, the proposed method got better performance than conventional word trigram.

Original languageEnglish
Pages490-493
Number of pages4
Publication statusPublished - 1996
EventProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA
Duration: 1996 Oct 31996 Oct 6

Conference

ConferenceProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4)
CityPhiladelphia, PA, USA
Period96/10/396/10/6

Fingerprint

Dive into the research topics of 'Language modeling by string pattern N-gram for Japanese speech recognition'. Together they form a unique fingerprint.

Cite this