TY - GEN
T1 - A boyer-moore type algorithm for compressed pattern matching
AU - Shibata, Yusuke
AU - Matsumoto, Tetsuya
AU - Takeda, Masayuki
AU - Shinohara, Ayumi
AU - Arikawa, Setsuo
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2000.
PY - 2000
Y1 - 2000
N2 - We apply the Boyer-Moore technique to compressed pat-tern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compres-sion methods. For a subclass of collage systems that contain no trun-cation, our new algorithm runs in O(‖D‖ + n m + m2 + r) time using O(‖D‖ + m2) space, where ‖D‖ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D) (‖D‖+n)+n m+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1:2 ~ 3:0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.
AB - We apply the Boyer-Moore technique to compressed pat-tern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compres-sion methods. For a subclass of collage systems that contain no trun-cation, our new algorithm runs in O(‖D‖ + n m + m2 + r) time using O(‖D‖ + m2) space, where ‖D‖ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D) (‖D‖+n)+n m+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1:2 ~ 3:0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.
UR - http://www.scopus.com/inward/record.url?scp=84937407089&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937407089&partnerID=8YFLogxK
U2 - 10.1007/3-540-45123-4_17
DO - 10.1007/3-540-45123-4_17
M3 - Conference contribution
AN - SCOPUS:84937407089
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 181
EP - 194
BT - Combinatorial Pattern Matching - 11th Annual Symposium, CPM 2000, Proceedings
A2 - Giancarlo, Raffaele
A2 - Sankoff, David
PB - Springer Verlag
T2 - 11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000
Y2 - 21 June 2000 through 23 June 2000
ER -