TY - JOUR
T1 - A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
AU - Nariai, Naoki
AU - Kojima, Kaname
AU - Mimori, Takahiro
AU - Kawai, Yosuke
AU - Nagasaki, Masao
N1 - Funding Information:
The publication costs for this article were partly funded by MEXT Tohoku Medical Megabank Project. This article has been published as part of BMC Genomics Volume 17 Supplement 1, 2016: Selected articles from the Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016): Genomics. The full contents of the supplement are available online at http://www. biomedcentral.com/bmcgenomics/supplements/17/S1.
Publisher Copyright:
© 2015 Nariai et al.
PY - 2016/1/11
Y1 - 2016/1/11
N2 - Background: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. Results: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. Conclusions: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar.
AB - Background: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. Results: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. Conclusions: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar.
KW - Allele-specific expression
KW - Bayesian inference
KW - RNA-Seq data
UR - http://www.scopus.com/inward/record.url?scp=84953870433&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84953870433&partnerID=8YFLogxK
U2 - 10.1186/s12864-015-2295-5
DO - 10.1186/s12864-015-2295-5
M3 - Article
C2 - 26818838
AN - SCOPUS:84953870433
SN - 1471-2164
VL - 17
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 2
ER -