A graph-based approach to named entity categorization in Wikipedia using conditional random fields

Yotaro Watanabe, Masayuki Asahara, Yuji Matsumoto

Research output: Contribution to conferencePaperpeer-review

42 Citations (Scopus)

Abstract

This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this representation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate HTML structure on the graph, three types of cliques are defined based on the HTML tree structure. We propose a method with Conditional Random Fields (CRFs) to categorize the nodes on the graph. Since the defined graph may include cycles, the exact inference of CRFs is computationally expensive. We introduce an approximate inference method using Treebased Reparameterization (TRP) to reduce computational cost. In experiments, our proposed model obtained significant improvements compare to baseline models that use Support Vector Machines.

Original languageEnglish
Pages649-657
Number of pages9
Publication statusPublished - 2007
Externally publishedYes
Event2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 - Prague, Czech Republic
Duration: 2007 Jun 282007 Jun 28

Other

Other2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007
Country/TerritoryCzech Republic
CityPrague
Period07/6/2807/6/28

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'A graph-based approach to named entity categorization in Wikipedia using conditional random fields'. Together they form a unique fingerprint.

Cite this