A boyer-moore type algorithm for compressed pattern matching

Yusuke Shibata, Tetsuya Matsumoto, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

45 Citations (Scopus)

Abstract

We apply the Boyer-Moore technique to compressed pat-tern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compres-sion methods. For a subclass of collage systems that contain no trun-cation, our new algorithm runs in O(‖D‖ + n m + m2 + r) time using O(‖D‖ + m2) space, where ‖D‖ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D) (‖D‖+n)+n m+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1:2 ~ 3:0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 11th Annual Symposium, CPM 2000, Proceedings
EditorsRaffaele Giancarlo, David Sankoff
PublisherSpringer Verlag
Pages181-194
Number of pages14
ISBN (Electronic)3540676333, 9783540676331
DOIs
Publication statusPublished - 2000
Event11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000 - Montreal, Canada
Duration: 2000 Jun 212000 Jun 23

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1848
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000
Country/TerritoryCanada
CityMontreal
Period00/6/2100/6/23

Fingerprint

Dive into the research topics of 'A boyer-moore type algorithm for compressed pattern matching'. Together they form a unique fingerprint.

Cite this