TY - GEN
T1 - Efficient computation of substring equivalence classes with suffix arrays
AU - Narisawa, Kazuyuki
AU - Inenaga, Shunsuke
AU - Bannai, Hideo
AU - Takeda, Masayuki
PY - 2007
Y1 - 2007
N2 - This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.
AB - This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.
UR - http://www.scopus.com/inward/record.url?scp=37849015556&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=37849015556&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73437-6_34
DO - 10.1007/978-3-540-73437-6_34
M3 - Conference contribution
AN - SCOPUS:37849015556
SN - 9783540734369
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 340
EP - 351
BT - Combinatorial Pattern Matching - 18th Annual Symposium, CPM 2007, Proceedings
PB - Springer Verlag
T2 - 18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007
Y2 - 9 July 2007 through 11 July 2007
ER -