Matataki: An ultrafast mRNA quantification method for large-scale reanalysis of RNA-Seq data

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)


Background: Data generated by RNA sequencing (RNA-Seq) is now accumulating in vast amounts in public repositories, especially for human and mouse genomes. Reanalyzing these data has emerged as a promising approach to identify gene modules or pathways. Although meta-analyses of gene expression data are frequently performed using microarray data, meta-analyses using RNA-Seq data are still rare. This lag is partly due to the limitations in reanalyzing RNA-Seq data, which requires extensive computational resources. Moreover, it is nearly impossible to calculate the gene expression levels of all samples in a public repository using currently available methods. Here, we propose a novel method, Matataki, for rapidly estimating gene expression levels from RNA-Seq data. Results: The proposed method uses k-mers that are unique to each gene for the mapping of fragments to genes. Since aligning fragments to reference sequences requires high computational costs, our method could reduce the calculation cost by focusing on k-mers that are unique to each gene and by skipping uninformative regions. Indeed, Matataki outperformed conventional methods with regards to speed while demonstrating sufficient accuracy. Conclusions: The development of Matataki can overcome current limitations in reanalyzing RNA-Seq data toward improving the potential for discovering genes and pathways associated with disease at reduced computational cost. Thus, the main bottleneck of RNA-Seq analyses has shifted to achieving the decompression of sequenced data. The implementation of Matataki is available at

Original languageEnglish
Article number266
JournalBMC Bioinformatics
Issue number1
Publication statusPublished - 2018 Jul 16


  • Gene expression
  • Mapping
  • RNA-Seq


Dive into the research topics of 'Matataki: An ultrafast mRNA quantification method for large-scale reanalysis of RNA-Seq data'. Together they form a unique fingerprint.

Cite this