TY - JOUR
T1 - Multi-dimensional correlations for gene coexpression and application to the large-scale data of Arabidopsis
AU - Kinoshita, Kengo
AU - Obayashi, Takeshi
N1 - Funding Information:
Funding: Grant-in-Aid for Scientific Research on Priority Areas Transportsome from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to K.K.); the Global COE Program (Center of Education and Research for Advanced Genome-Based Medicine), MEXT, Japan (to T.O.).
PY - 2009
Y1 - 2009
N2 - Background: Recent improvements in DNA microarray techniques have made a large variety of gene expression data available in public databases. This data can be used to evaluate the strength of gene coexpression by calculating the correlation of expression patterns among different genes between many experiments. However, gene expression levels differ significantly across various tissues in higher organisms, as well as in different cellular location in eukaryotes in different cell state. Thus the usual correlation measure can only evaluate the difference of tissues or cellular localizations, and cannot adequately elucidate the functional relationship from the coexpression of genes. Method: We propose a new measure of coexpression by expanding the generally used correlation into a multidimensional one. We used principal component analyses to identify the major factors of gene expression correlation, and then re-calculate the correlation by subtracting the major components in order to remove biases cased by a few experiments. The repeated subtractions of the major components yielded a set of correlation values for each pair of genes. We observed the correlation changes when the first ten principal components were subtracted step-by-step in large-scale Arabidopsis expression data. Results: We found two extreme patterns of correlation changes, corresponding to stable and fragile coexpression. Our new indexes provided a good means to determine the functional relationships of the genes, by examining a few examples, and higher performance of Gene Ontology term prediction by using the support vector machine and the multidimensional correlation.
AB - Background: Recent improvements in DNA microarray techniques have made a large variety of gene expression data available in public databases. This data can be used to evaluate the strength of gene coexpression by calculating the correlation of expression patterns among different genes between many experiments. However, gene expression levels differ significantly across various tissues in higher organisms, as well as in different cellular location in eukaryotes in different cell state. Thus the usual correlation measure can only evaluate the difference of tissues or cellular localizations, and cannot adequately elucidate the functional relationship from the coexpression of genes. Method: We propose a new measure of coexpression by expanding the generally used correlation into a multidimensional one. We used principal component analyses to identify the major factors of gene expression correlation, and then re-calculate the correlation by subtracting the major components in order to remove biases cased by a few experiments. The repeated subtractions of the major components yielded a set of correlation values for each pair of genes. We observed the correlation changes when the first ten principal components were subtracted step-by-step in large-scale Arabidopsis expression data. Results: We found two extreme patterns of correlation changes, corresponding to stable and fragile coexpression. Our new indexes provided a good means to determine the functional relationships of the genes, by examining a few examples, and higher performance of Gene Ontology term prediction by using the support vector machine and the multidimensional correlation.
UR - http://www.scopus.com/inward/record.url?scp=70349999416&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349999416&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btp442
DO - 10.1093/bioinformatics/btp442
M3 - Article
C2 - 19620096
AN - SCOPUS:70349999416
SN - 1367-4803
VL - 25
SP - 2677
EP - 2684
JO - Bioinformatics
JF - Bioinformatics
IS - 20
ER -