Inducing context gazetteers from encyclopedic databases for named entity recognition

Han Cheol Cho, Naoaki Okazaki, Kentaro Inui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Named entity recognition (NER) is a fundamental task for mining valuable information from unstructured and semi-structured texts. State-of-the-art NER models mostly employ a supervised machine learning approach that heavily depends on local contexts. However, results of recent research have demonstrated that non-local contexts at the sentence or document level can help advance the improvement of recognition performance. As described in this paper, we propose the use of a context gazetteer, the list of contexts with which entity names can cooccur, as new non-local context information.We build a context gazetteer from an encyclopedic database because manually annotated data are often too few to extract rich and sophisticated context patterns. In addition, dependency path is used as sentence level non-local context to capture more syntactically related contexts to entity mentions than linear context in traditional NER. In the discussion of experimentation used for this study, we build a context gazetteer of gene names and apply it for a biomedical NER task. High confidence context patterns appear in various forms. Some are similar to a predicate-argument structure whereas some are in unexpected forms. The experiment results show that the proposed model using both entity and context gazetteers improves both precision and recall over a strong baseline model, and therefore the usefulness of the context gazetteer.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
Pages378-389
Number of pages12
EditionPART 1
DOIs
Publication statusPublished - 2013
Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD, Australia
Duration: 2013 Apr 142013 Apr 17

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7818 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
Country/TerritoryAustralia
CityGold Coast, QLD
Period13/4/1413/4/17

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Inducing context gazetteers from encyclopedic databases for named entity recognition'. Together they form a unique fingerprint.

Cite this