Corruption Is Not All Bad: Incorporating Discourse Structure into Pre-Training via Corruption for Essay Scoring

Farjana Sultana Mim, Naoya Inoue, Paul Reisert, Hiroki Ouchi, Kentaro Inui

研究成果: Article査読

1 被引用数 (Scopus)

抄録

Existing approaches for automated essay scoring and document representation learning typically rely on discourse parsers to incorporate discourse structure into text representation. However, the performance of parsers is not always adequate, especially when they are used on noisy texts, such as student essays. In this paper, we propose an unsupervised pre-training approach to capture discourse structure of essays in terms of coherence and cohesion that does not require any discourse parser or annotation. We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information. Our proposed unsupervised approach achieves a new state-of-the-art result on the task of essay Organization scoring.

本文言語English
論文番号9451631
ページ(範囲)2202-2215
ページ数14
ジャーナルIEEE/ACM Transactions on Audio Speech and Language Processing
29
DOI
出版ステータスPublished - 2021

ASJC Scopus subject areas

  • コンピュータ サイエンス(その他)
  • 音響学および超音波学
  • 計算数学
  • 電子工学および電気工学

フィンガープリント

「Corruption Is Not All Bad: Incorporating Discourse Structure into Pre-Training via Corruption for Essay Scoring」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル