TY - JOUR
T1 - Large-scale collection and characterization of promoters of human and mouse genes
AU - Suzuki, Yutaka
AU - Yamashita, Riu
AU - Shirota, Matsuyuki
AU - Sakakibara, Yuta
AU - Chiba, Joe
AU - Mizushima-Sugano, Junko
AU - Kel, Alexander E.
AU - Arakawa, Takahiro
AU - Carninci, Piero
AU - Kawai, Jun
AU - Hayashizaki, Yoshihide
AU - Takagi, Toshihisa
AU - Nakai, Kenta
AU - Sugano, Sumio
PY - 2004
Y1 - 2004
N2 - We report the generation and initial characterization of a large-scale collection of sequences of putative promoter regions (PPRs) of human and mouse genes. Based on our unique collection of 400,225 and 580,209 human and mouse full-length cDNAs, we determined exact transcriptional start sites (TSSs). Using positional information of the TSSs, we could retrieve adjacent sequences as PPRs for 8,793 and 6,875 human and mouse genes, respectively. The positions of the PPRs were 4 kb upstream to previously reported 5'-ends of cDNAs on average, demonstrating that full-length cDNA information is indispensable for this purpose. Among those PPRs supported by experimentally validated TSSs, 3,324 could be paired as mutually homologous genes between human and mouse and were used for the comprehensive comparative studies. The sequence identities in the proximal regions of the TSSs were 45% on average, and 22,794 putative transcription factor binding sites that are conserved between human and mouse were identified. The data resource created in the present work and the results of the sequences' initial characterization should lay the firm foundation for deciphering the transcriptional modulations of human genes. All the data were deposited and made available through a database for comparative studies, DBTSS.
AB - We report the generation and initial characterization of a large-scale collection of sequences of putative promoter regions (PPRs) of human and mouse genes. Based on our unique collection of 400,225 and 580,209 human and mouse full-length cDNAs, we determined exact transcriptional start sites (TSSs). Using positional information of the TSSs, we could retrieve adjacent sequences as PPRs for 8,793 and 6,875 human and mouse genes, respectively. The positions of the PPRs were 4 kb upstream to previously reported 5'-ends of cDNAs on average, demonstrating that full-length cDNA information is indispensable for this purpose. Among those PPRs supported by experimentally validated TSSs, 3,324 could be paired as mutually homologous genes between human and mouse and were used for the comprehensive comparative studies. The sequence identities in the proximal regions of the TSSs were 45% on average, and 22,794 putative transcription factor binding sites that are conserved between human and mouse were identified. The data resource created in the present work and the results of the sequences' initial characterization should lay the firm foundation for deciphering the transcriptional modulations of human genes. All the data were deposited and made available through a database for comparative studies, DBTSS.
KW - Comparative genomics
KW - Full-length cDNA
KW - Promoter
KW - Transcriptional start sites
UR - http://www.scopus.com/inward/record.url?scp=20144386500&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=20144386500&partnerID=8YFLogxK
M3 - Article
C2 - 15506993
AN - SCOPUS:20144386500
SN - 1386-6338
VL - 4
SP - 429
EP - 444
JO - In Silico Biology
JF - In Silico Biology
IS - 4
ER -