Geographical entity annotated corpus of Japanese microblogs

Koji Matsuda, Akira Sasaki, Naoaki Okazaki, Kentaro Inui

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

This paper addresses the issues in the task of annotating geographical entities on microblogs and reports the preliminary results of our efforts to annotate Japanese microblog texts. Unlike prior work, we aim at annotating not only geographical location entities but also facility entities, such as stations, restaurants and schools. We discuss (i) how to build a gazetteer of geographical entities with a sufficiently broad coverage, (ii) what types ambiguities that need to be considered, (iii) why the annotator tends to disagree, and (iv) what technical problems should be addressed to automate the task of annotating the geographical entities. All the annotation data and the annotation guidelines are publicly available for research purposes from our web site.

Original languageEnglish
Pages (from-to)121-130
Number of pages10
JournalJournal of Information Processing
Volume25
DOIs
Publication statusPublished - 2017

Keywords

  • Corpus annotation
  • Location reference expressions
  • Microblogs
  • Natural language processing

Fingerprint

Dive into the research topics of 'Geographical entity annotated corpus of Japanese microblogs'. Together they form a unique fingerprint.

Cite this