TY - GEN
T1 - Discovering best variable-length-don’t-care patterns
AU - Inenaga, Shunsuke
AU - Bannai, Hideo
AU - Shinohara, Ayumi
AU - Takeda, Masayuki
AU - Arikawa, Setsuo
PY - 2002
Y1 - 2002
N2 - A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{∗})∗, where Σ is an alphabet and ∗ is a wildcard matching any string in Σ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair ‹q, k›, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.
AB - A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{∗})∗, where Σ is an alphabet and ∗ is a wildcard matching any string in Σ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair ‹q, k›, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.
UR - http://www.scopus.com/inward/record.url?scp=84949815829&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949815829&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84949815829
SN - 3540001883
SN - 9783540001881
VL - 2534
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 97
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PB - Springer Verlag
T2 - 5th International Conference on Discovery Science, DS 2002
Y2 - 24 November 2002 through 26 November 2002
ER -