TY - GEN
T1 - Inducing context gazetteers from encyclopedic databases for named entity recognition
AU - Cho, Han Cheol
AU - Okazaki, Naoaki
AU - Inui, Kentaro
PY - 2013
Y1 - 2013
N2 - Named entity recognition (NER) is a fundamental task for mining valuable information from unstructured and semi-structured texts. State-of-the-art NER models mostly employ a supervised machine learning approach that heavily depends on local contexts. However, results of recent research have demonstrated that non-local contexts at the sentence or document level can help advance the improvement of recognition performance. As described in this paper, we propose the use of a context gazetteer, the list of contexts with which entity names can cooccur, as new non-local context information.We build a context gazetteer from an encyclopedic database because manually annotated data are often too few to extract rich and sophisticated context patterns. In addition, dependency path is used as sentence level non-local context to capture more syntactically related contexts to entity mentions than linear context in traditional NER. In the discussion of experimentation used for this study, we build a context gazetteer of gene names and apply it for a biomedical NER task. High confidence context patterns appear in various forms. Some are similar to a predicate-argument structure whereas some are in unexpected forms. The experiment results show that the proposed model using both entity and context gazetteers improves both precision and recall over a strong baseline model, and therefore the usefulness of the context gazetteer.
AB - Named entity recognition (NER) is a fundamental task for mining valuable information from unstructured and semi-structured texts. State-of-the-art NER models mostly employ a supervised machine learning approach that heavily depends on local contexts. However, results of recent research have demonstrated that non-local contexts at the sentence or document level can help advance the improvement of recognition performance. As described in this paper, we propose the use of a context gazetteer, the list of contexts with which entity names can cooccur, as new non-local context information.We build a context gazetteer from an encyclopedic database because manually annotated data are often too few to extract rich and sophisticated context patterns. In addition, dependency path is used as sentence level non-local context to capture more syntactically related contexts to entity mentions than linear context in traditional NER. In the discussion of experimentation used for this study, we build a context gazetteer of gene names and apply it for a biomedical NER task. High confidence context patterns appear in various forms. Some are similar to a predicate-argument structure whereas some are in unexpected forms. The experiment results show that the proposed model using both entity and context gazetteers improves both precision and recall over a strong baseline model, and therefore the usefulness of the context gazetteer.
UR - http://www.scopus.com/inward/record.url?scp=84893626643&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893626643&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-37453-1_31
DO - 10.1007/978-3-642-37453-1_31
M3 - Conference contribution
AN - SCOPUS:84893626643
SN - 9783642374524
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 378
EP - 389
BT - Advances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
T2 - 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
Y2 - 14 April 2013 through 17 April 2013
ER -