TY - JOUR
T1 - Geographical entity annotated corpus of Japanese microblogs
AU - Matsuda, Koji
AU - Sasaki, Akira
AU - Okazaki, Naoaki
AU - Inui, Kentaro
N1 - Funding Information:
This research was supported by the program Research and Development on Real World Big Data Integration and Analysis of the Ministry of Education, Culture, Sports, Science and Technology, Japan and by the Precursory Research for Embryonic Science and Technology (PRESTO), Japan Science and Technology Agency (JST).
Publisher Copyright:
© 2017 Information Processing Society of Japan.
PY - 2017
Y1 - 2017
N2 - This paper addresses the issues in the task of annotating geographical entities on microblogs and reports the preliminary results of our efforts to annotate Japanese microblog texts. Unlike prior work, we aim at annotating not only geographical location entities but also facility entities, such as stations, restaurants and schools. We discuss (i) how to build a gazetteer of geographical entities with a sufficiently broad coverage, (ii) what types ambiguities that need to be considered, (iii) why the annotator tends to disagree, and (iv) what technical problems should be addressed to automate the task of annotating the geographical entities. All the annotation data and the annotation guidelines are publicly available for research purposes from our web site.
AB - This paper addresses the issues in the task of annotating geographical entities on microblogs and reports the preliminary results of our efforts to annotate Japanese microblog texts. Unlike prior work, we aim at annotating not only geographical location entities but also facility entities, such as stations, restaurants and schools. We discuss (i) how to build a gazetteer of geographical entities with a sufficiently broad coverage, (ii) what types ambiguities that need to be considered, (iii) why the annotator tends to disagree, and (iv) what technical problems should be addressed to automate the task of annotating the geographical entities. All the annotation data and the annotation guidelines are publicly available for research purposes from our web site.
KW - Corpus annotation
KW - Location reference expressions
KW - Microblogs
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85009888011&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009888011&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.25.121
DO - 10.2197/ipsjjip.25.121
M3 - Article
AN - SCOPUS:85009888011
SN - 1882-6652
VL - 25
SP - 121
EP - 130
JO - Journal of Information Processing
JF - Journal of Information Processing
ER -