TY - GEN

T1 - Truncated DAWGs and their application to minimal absent word problem

AU - Fujishige, Yuta

AU - Takagi, Takuya

AU - Hendrian, Diptarama

N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.

PY - 2018

Y1 - 2018

N2 - The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has O(n) nodes and edges. Na et al. [11] proposed k-truncated suffix tree which is a compressed trie that represents substrings of a string whose length up to k. In this paper, we present a new data structure called k-truncated DAWGs, which can be obtained by pruning the DAWGs. We show that the size complexity of the k-truncated DAWG of a string y of length n is O(min{{n,kz}) which is equal to the truncated suffix tree’s one, where z is the size of LZ77 factorization of y. We also present an O(n log σ) time and O(min{{n,kz}) space algorithm for constructing the k-truncated DAWG of y, where σ is the alphabet size. As an application of the truncated DAWGs, we show that the set MAWk(y) of all minimal absent words of y whose length is smaller than or equal to k can be computed by using k-truncated DAWG of y in O(min{{n,kz}) + |MAWk(y)|) time and O(min{{n,kz}) working space.

AB - The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has O(n) nodes and edges. Na et al. [11] proposed k-truncated suffix tree which is a compressed trie that represents substrings of a string whose length up to k. In this paper, we present a new data structure called k-truncated DAWGs, which can be obtained by pruning the DAWGs. We show that the size complexity of the k-truncated DAWG of a string y of length n is O(min{{n,kz}) which is equal to the truncated suffix tree’s one, where z is the size of LZ77 factorization of y. We also present an O(n log σ) time and O(min{{n,kz}) space algorithm for constructing the k-truncated DAWG of y, where σ is the alphabet size. As an application of the truncated DAWGs, we show that the set MAWk(y) of all minimal absent words of y whose length is smaller than or equal to k can be computed by using k-truncated DAWG of y in O(min{{n,kz}) + |MAWk(y)|) time and O(min{{n,kz}) working space.

UR - http://www.scopus.com/inward/record.url?scp=85054848426&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054848426&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00479-8_12

DO - 10.1007/978-3-030-00479-8_12

M3 - Conference contribution

AN - SCOPUS:85054848426

SN - 9783030004781

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 139

EP - 152

BT - String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings

A2 - Gagie, Travis

A2 - Moffat, Alistair

A2 - Navarro, Gonzalo

A2 - Cuadros-Vargas, Ernesto

PB - Springer Verlag

T2 - 25th International Symposium on String Processing and Information Retrieval, SPIRE 2018

Y2 - 9 October 2018 through 11 October 2018

ER -