TY - JOUR
T1 - Revisiting amino acid substitution matrices for identifying distantly related proteins
AU - Yamada, Kazunori
AU - Tomii, Kentaro
N1 - Funding Information:
Funding: Platform for Drug Discovery, Informatics, and Structural Life Science from the Ministry of Education, Culture, Sports, Science and Technology, Japan.
PY - 2014/2/1
Y1 - 2014/2/1
N2 - Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever.Results: Using principal component analysis with existing matrices and benchmarks, we developed a novel matrix, which we designate as MIQS. The detection performance of MIQS is validated and compared with that of existing general purpose matrices using SSEARCH with optimized gap penalties for each matrix. Results show that MIQS is able to detect more remote homology than the existing matrices on an independent dataset. In addition, the performance of our developed matrix was superior to that of CS-BLAST, which was a novel similarity search method with no amino acid matrix. We also evaluated the alignment quality of matrices and methods, which revealed that MIQS shows higher alignment sensitivity than that with the existing matrix series and CS-BLAST. Fundamentally, these results are expected to constitute good proof of the availability and/or importance of amino acid matrices in sequence analysis. Moreover, with our developed matrix, sophisticated similarity search methods such as sequence-profile and profile-profile comparison methods can be improved further.
AB - Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever.Results: Using principal component analysis with existing matrices and benchmarks, we developed a novel matrix, which we designate as MIQS. The detection performance of MIQS is validated and compared with that of existing general purpose matrices using SSEARCH with optimized gap penalties for each matrix. Results show that MIQS is able to detect more remote homology than the existing matrices on an independent dataset. In addition, the performance of our developed matrix was superior to that of CS-BLAST, which was a novel similarity search method with no amino acid matrix. We also evaluated the alignment quality of matrices and methods, which revealed that MIQS shows higher alignment sensitivity than that with the existing matrix series and CS-BLAST. Fundamentally, these results are expected to constitute good proof of the availability and/or importance of amino acid matrices in sequence analysis. Moreover, with our developed matrix, sophisticated similarity search methods such as sequence-profile and profile-profile comparison methods can be improved further.
UR - http://www.scopus.com/inward/record.url?scp=84893325672&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893325672&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btt694
DO - 10.1093/bioinformatics/btt694
M3 - Article
C2 - 24281694
AN - SCOPUS:84893325672
SN - 1367-4803
VL - 30
SP - 317
EP - 325
JO - Bioinformatics
JF - Bioinformatics
IS - 3
ER -