Preventing critical scoring errors in short answer scoring with confidence estimation

Hiroaki Funayama, Shota Sasaki, Yuichiroh Matsubayashi, Tomoya Mizumoto, Jun Suzuki, Masato Mita, Kentaro Inui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Many recent Short Answer Scoring (SAS) systems have employed Quadratic Weighted Kappa (QWK) as the evaluation measure of their systems. However, we hypothesize that QWK is unsatisfactory for the evaluation of the SAS systems when we consider measuring their effectiveness in actual usage. We introduce a new task formulation of SAS that matches the actual usage. In our formulation, the SAS systems should extract as many scoring predictions that are not critical scoring errors (CSEs). We conduct the experiments in our new task formulation and demonstrate that a typical SAS system can predict scores with zero CSE for approximately 50% of test data at maximum by filtering out low-reliablility predictions on the basis of a certain confidence estimation. This result directly indicates the possibility of reducing half the scoring cost of human raters, which is more preferable for the evaluation of SAS systems.

Original languageEnglish
Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages237-243
Number of pages7
ISBN (Electronic)9781952148033
Publication statusPublished - 2020
Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Student Research Workshop, SRW 2020 - Virtual, Online, United States
Duration: 2020 Jul 52020 Jul 10

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Student Research Workshop, SRW 2020
Country/TerritoryUnited States
CityVirtual, Online
Period20/7/520/7/10

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Preventing critical scoring errors in short answer scoring with confidence estimation'. Together they form a unique fingerprint.

Cite this