TY - JOUR
T1 - Co-expressed Pathways DataBase for Tomato
T2 - A database to predict pathways relevant to a query gene
AU - Narise, Takafumi
AU - Sakurai, Nozomu
AU - Obayashi, Takeshi
AU - Ohta, Hiroyuki
AU - Shibata, Daisuke
N1 - Funding Information:
This work was conducted as part of the Low Carbon Technology Research and Development program funded by the Ministry of the Environment, Japan.
Publisher Copyright:
© 2017 The Author(s).
PY - 2017/6/5
Y1 - 2017/6/5
N2 - Background: Gene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes. Results: In this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA. Conclusions: We developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato. The database allows users to predict pathways that are relevant to a query gene, which would help to infer gene functions.
AB - Background: Gene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes. Results: In this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA. Conclusions: We developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato. The database allows users to predict pathways that are relevant to a query gene, which would help to infer gene functions.
KW - Co-expression database
KW - Gene set enrichment analysis
KW - Over-representation analysis
KW - Pathway
KW - Percentile-score
UR - http://www.scopus.com/inward/record.url?scp=85020218604&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020218604&partnerID=8YFLogxK
U2 - 10.1186/s12864-017-3786-3
DO - 10.1186/s12864-017-3786-3
M3 - Article
C2 - 28583129
AN - SCOPUS:85020218604
SN - 1471-2164
VL - 18
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 437
ER -