TY - JOUR
T1 - A systematic investigation identifies a significant number of probable pseudogenes in the Escherichia coli genome
AU - Homma, Keiichi
AU - Fukuchi, Satoshi
AU - Kawabata, Takeshi
AU - Ota, Motonori
AU - Nishikawa, Ken
N1 - Funding Information:
We express our gratitude to A. Nishimura and Y. Yamazaki for vigorous discussions, T. Horiuchi for communication of unpublished data, J. Kato for critical comments, and H. Toh for erudite advice. This work was supported in part by postdoctoral fellowships to K.H. and S.F. in the ACT-JST program of Japan Science Technology, Corp. and a grant-in-aid from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
PY - 2002/7/10
Y1 - 2002/7/10
N2 - Pseudogenes are open reading frames (ORFs) encoding dysfunctional proteins with high homology to known protein-coding genes. Although pseudogenes were reported to exist in the genomes of many eukaryotes and bacteria, no systematic search for pseudogenes in the Escherichia coli genome has been carried out. Genome comparisons of E. coli strains K-12 and O157 revealed that many protein-coding sequences have prematurely terminated orthologs encoding unstable proteins. To systematically screen for pseudogenes, we selected ORFs generated by premature termination of the orthologous protein-coding genes and subsequently excluded those possibly arising from sequence errors. Lastly we eliminated those with close homologs in this and other species, as these shortened ORFs may actually have functions. The process produced 95 and 101 pseudogene candidates in K-12 and O157, respectively. The assigned three-dimensional structures suggest that most of the encoded proteins cannot fold properly and thus are dysfunctional, indicating that they are probably pseudogenes. Therefore, the existence of a significant number of probable pseudogenes in E. coli is predicted, awaiting experimental verification. Most of them were found to be genes with paralogs or horizontally transferred genes or both. We suggest that pseudogenes constitute a small fraction of the genomes of free-living bacteria in general, reflecting the faster elimination than production of pseudogenes.
AB - Pseudogenes are open reading frames (ORFs) encoding dysfunctional proteins with high homology to known protein-coding genes. Although pseudogenes were reported to exist in the genomes of many eukaryotes and bacteria, no systematic search for pseudogenes in the Escherichia coli genome has been carried out. Genome comparisons of E. coli strains K-12 and O157 revealed that many protein-coding sequences have prematurely terminated orthologs encoding unstable proteins. To systematically screen for pseudogenes, we selected ORFs generated by premature termination of the orthologous protein-coding genes and subsequently excluded those possibly arising from sequence errors. Lastly we eliminated those with close homologs in this and other species, as these shortened ORFs may actually have functions. The process produced 95 and 101 pseudogene candidates in K-12 and O157, respectively. The assigned three-dimensional structures suggest that most of the encoded proteins cannot fold properly and thus are dysfunctional, indicating that they are probably pseudogenes. Therefore, the existence of a significant number of probable pseudogenes in E. coli is predicted, awaiting experimental verification. Most of them were found to be genes with paralogs or horizontally transferred genes or both. We suggest that pseudogenes constitute a small fraction of the genomes of free-living bacteria in general, reflecting the faster elimination than production of pseudogenes.
KW - Gram-negative bacteria
KW - Horizontal transfer
KW - Position-specific iterated basic local alignment search tool
KW - Structure prediction
KW - Three-dimensional structure
UR - http://www.scopus.com/inward/record.url?scp=0037055273&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037055273&partnerID=8YFLogxK
U2 - 10.1016/S0378-1119(02)00794-1
DO - 10.1016/S0378-1119(02)00794-1
M3 - Article
C2 - 12234664
AN - SCOPUS:0037055273
SN - 0378-1119
VL - 294
SP - 25
EP - 33
JO - Gene
JF - Gene
IS - 1-2
ER -