TY - GEN
T1 - Experience mining
T2 - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
AU - Inui, Kentaro
AU - Abe, Shuya
AU - Hara, Kazuo
AU - Morita, Hiraku
AU - Sao, Chitose
AU - Eguchi, Megumi
AU - Sumida, Asuka
AU - Murakami, Koji
AU - Matsuyoshi, Suguru
PY - 2008
Y1 - 2008
N2 - This paper proposes a new UGC-oriented language technology application, which we call experience mining. Experience mining aims at automatically collecting instances of personal experiences as well as opinions from an explosive number of user generated contents (UGCs) such as weblog and forum posts and storing them in an experience database with semantically rich indices. After arguing the technical issues of this new task, we focus on the central problem, factuality analysis, among others and propose a machine learning-based solution as well as the task definition itself. Our empirical evaluation indicates that our factuality analysis task is sufficiently well-defined to achieve a high inter-annotator agreement and our Factorial CRF-based model considerably outperforms the baseline. We also present an application system, which currently stores over 50M experience instances extracted from 150M Japanese blog posts with semantic indices and is scheduled to start serving as an experience search engine for unrestricted users in October.
AB - This paper proposes a new UGC-oriented language technology application, which we call experience mining. Experience mining aims at automatically collecting instances of personal experiences as well as opinions from an explosive number of user generated contents (UGCs) such as weblog and forum posts and storing them in an experience database with semantically rich indices. After arguing the technical issues of this new task, we focus on the central problem, factuality analysis, among others and propose a machine learning-based solution as well as the task definition itself. Our empirical evaluation indicates that our factuality analysis task is sufficiently well-defined to achieve a high inter-annotator agreement and our Factorial CRF-based model considerably outperforms the baseline. We also present an application system, which currently stores over 50M experience instances extracted from 150M Japanese blog posts with semantic indices and is scheduled to start serving as an experience search engine for unrestricted users in October.
UR - http://www.scopus.com/inward/record.url?scp=62949112211&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=62949112211&partnerID=8YFLogxK
U2 - 10.1109/WIIAT.2008.373
DO - 10.1109/WIIAT.2008.373
M3 - Conference contribution
AN - SCOPUS:62949112211
SN - 9780769534961
T3 - Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
SP - 314
EP - 321
BT - Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008
Y2 - 9 December 2008 through 12 December 2008
ER -