TY - JOUR
T1 - The gene normalization task in BioCreative III
AU - Lu, Zhiyong
AU - Kao, Hung Yu
AU - Wei, Chih Hsuan
AU - Huang, Minlie
AU - Liu, Jingchen
AU - Kuo, Cheng Ju
AU - Hsu, Chun Nan
AU - Tsai, Richard T.
AU - Dai, Hong Jie
AU - Okazaki, Naoaki
AU - Cho, Han Cheol
AU - Gerner, Martin
AU - Solt, Illes
AU - Agarwal, Shashank
AU - Liu, Feifan
AU - Vishnyakova, Dina
AU - Ruch, Patrick
AU - Romacker, Martin
AU - Rinaldi, Fabio
AU - Bhattacharya, Sanmitra
AU - Srinivasan, Padmini
AU - Liu, Hongfang
AU - Torii, Manabu
AU - Matos, Sergio
AU - Campos, David
AU - Verspoor, Karin
AU - Livingston, Kevin M.
AU - Wilbur, W. J.
N1 - Funding Information:
The organizers would like to thank Lynette Hirschman for her helpful discussion and feedback on the earlier version of this paper. Zhiyong Lu and W. John Wilbur were supported by the Intramural Research Program of the NIH, National Library of Medicine. For team 93, this was a collaborative work with Rune Sætre, Sampo Pyysalo, Tomoko Ohta, and Jun’ichi Tsujii, supported by Grants-in-Aid for Scientific Research on Priority Areas (MEXT) and for Solution-Oriented Research for Science and Technology (JST), Japan. The work of team 68 was performed in collaboration with Jörg Hakenberg, and was funded by the University of Manchester (for MG) and the Alexander-von-Humboldt Stiftung (for IS).Team 89 would like to thank Zuofeng Li for developing the genetic sequence based gene normalizer and acknowledge the support from the National Library of Medicine, grant numbers 5R01LM009836 to Hong Yu and 5R01LM010125 to Isaac Kohane. The Bibliomics and Text Mining (BiTeM, http://eagl.unige.ch/bitem/) group (Team 80) was supported by the European Union’s FP7 (Grant DebugIT # 217139). Additional contributors to the work of Team 80: Julien Gobeill, Emilie Pasche, Douglas Teodoro, Anne-Lise Veuthey and Arnaud Gaudinat. The OntoGene group (Team 65) was partially supported by the Swiss National Science Foundation (grants 100014-118396/1 and 105315-130558/1) and by NITAS/TMS, Text Mining Services, Novartis Pharma AG, Basel, Switzerland. Additional contributors to the work of Team 65: Gerold Schneider, Simon Clematide, and Therese Vachon. Team 97 was supported by NIH 1-R01-LM009959-01A1 and NSF CAREER 0845523. Team 78 would like to thank Aditya K. Sehgal for his valuable guidance with this work. Team 70 was partially supported by the Portuguese Foundation for Science and Technology (research project PTDC/EIA-CCO/100541/2008). Team 65 would like to thank William A. Baumgartner Jr., Kevin Bretonnel Cohen, Helen L. Johnson, Christophe Roeder, Lawrence E. Hunter, and all the members of the Center for Computational Pharmacology at the University of Colorado Denver, supported by NIH grants 3T15 LM009451-03S1 to K.L., 5R01 LM010120-02 to K.V., and 5R01 LM008111-05 and 5R01 GM083649-03 to L.H. All the authors would like to thank all the annotators who produced the gold-standard annotations. This article has been published as part of BMC Bioinformatics Volume 12 Supplement 8, 2011: The Third BioCreative – Critical Assessment of Information Extraction in Biology Challenge. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S8.
PY - 2011/10/3
Y1 - 2011/10/3
N2 - Background: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k).Results: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively.Conclusions: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance.
AB - Background: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k).Results: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively.Conclusions: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance.
UR - http://www.scopus.com/inward/record.url?scp=80052774027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052774027&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S8-S2
DO - 10.1186/1471-2105-12-S8-S2
M3 - Article
C2 - 22151901
AN - SCOPUS:80052774027
SN - 1471-2105
VL - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - SUPPL. 8
M1 - S2
ER -