Efficient computation of substring equivalence classes with suffix arrays

Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 18th Annual Symposium, CPM 2007, Proceedings
PublisherSpringer Verlag
Pages340-351
Number of pages12
ISBN (Print)9783540734369
DOIs
Publication statusPublished - 2007
Event18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007 - London, ON, Canada
Duration: 2007 Jul 92007 Jul 11

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4580 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007
Country/TerritoryCanada
CityLondon, ON
Period07/7/907/7/11

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Efficient computation of substring equivalence classes with suffix arrays'. Together they form a unique fingerprint.

Cite this