TY - JOUR
T1 - Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
AU - Lim, Kyungtaek
AU - Yamada, Kazunori D.
AU - Frith, Martin C.
AU - Tomii, Kentaro
N1 - Funding Information:
This work was partially supported by the Platform Project for Supporting in Drug Discovery and Life Science Research (Platform for Drug Discovery, Informatics, and Structural Life Science) from the Japan Agency for Medical Research and Development (AMED).
Publisher Copyright:
© 2017, The Author(s).
PY - 2016/12/1
Y1 - 2016/12/1
N2 - Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.
AB - Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.
KW - Alignment quality
KW - Amino acid substitution matrix
KW - Homology detection
UR - http://www.scopus.com/inward/record.url?scp=85009253305&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009253305&partnerID=8YFLogxK
U2 - 10.1007/s10969-016-9210-4
DO - 10.1007/s10969-016-9210-4
M3 - Article
C2 - 28083762
AN - SCOPUS:85009253305
SN - 1345-711X
VL - 17
SP - 147
EP - 154
JO - Journal of Structural and Functional Genomics
JF - Journal of Structural and Functional Genomics
IS - 4
ER -