TY - JOUR
T1 - SAHG, a comprehensive database of predicted structures of all human proteins
AU - Motono, Chie
AU - Nakata, Junichi
AU - Koike, Ryotaro
AU - Shimizu, Kana
AU - Shirota, Matsuyuki
AU - Amemiya, Takayuki
AU - Tomii, Kentaro
AU - Nagano, Nozomi
AU - Sakaya, Naofumi
AU - Misoo, Kiyotaka
AU - Sato, Miwa
AU - Kidera, Akinori
AU - Hiroaki, Hidekazu
AU - Shirai, Tsuyoshi
AU - Kinoshita, Kengo
AU - Noguchi, Tamotsu
AU - Ota, Motonori
N1 - Funding Information:
Japan Science and Technology Agency (JST) – Institute for Bioinformatics Research and Development (BIRD). Funding for open access charge: National Institute of Advanced Industrial Science and Technology (AIST).
PY - 2011/1
Y1 - 2011/1
N2 - Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special proteinstructure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith-Waterman profile-profile alignment), global-local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structurefunction relationships.
AB - Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special proteinstructure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith-Waterman profile-profile alignment), global-local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structurefunction relationships.
UR - http://www.scopus.com/inward/record.url?scp=78651314041&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651314041&partnerID=8YFLogxK
U2 - 10.1093/nar/gkq1057
DO - 10.1093/nar/gkq1057
M3 - Article
C2 - 21051360
AN - SCOPUS:78651314041
SN - 0305-1048
VL - 39
SP - D487-D493
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - SUPPL. 1
ER -