Abstract
Sequitur due to Nevill-Manning and Witten. [19] is a powerful program to infer a phrase hierarchy from the input text, that also provides extremely effective compression of large quantities of semi-structured text [18]. In this paper, we address the problem of searching in Sequitur compressed text directly. We show a compressed pattern matching algorithm that finds a pattern in compressed text without explicit decompression. We show that our algorithm is approximately 1.27 times faster than a decompression followed by an ordinal search.
Original language | English |
---|---|
Pages (from-to) | 469-478 |
Number of pages | 10 |
Journal | Proceedings of the Data Compression Conference |
Publication status | Published - 2001 |
Event | Data Compression Conference - Snowbird, UT, United States Duration: 2001 Mar 27 → 2001 Mar 29 |