Information regarding gene coexpression is useful to predict gene function. Several databases have been constructed for gene coexpression in model organisms based on a large amount of publicly available gene expression data measured by GeneChip platforms. In these databases, Pearson's correlation coefficients (PCCs) of gene expression patterns are widely used as a measure of gene coexpression. Although the coexpression measure or GeneChip summarization method affects the performance of the gene coexpression database, previous studies for these calculation procedures were tested with only a small number of samples and a particular species. To evaluate the effectiveness of coexpression measures, assessments with large-scale microarray data are required. We first examined characteristics of PCC and found that the optimal PCC threshold to retrieve functionally related genes was affected by the method of gene expression database construction and the target gene function. In addition, we found that this problem could be overcome when we used correlation ranks instead of correlation values. This observation was evaluated by large-scale gene expression data for four species: Arabidopsis, human, mouse and rat.
- Gene coexpression
- GeneChip summarization
- Pearson's correlation coefficient