The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools

Ryan Spring, Matthew Johnson

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Automatically calculating measures of lexical richness is important for L2 learning because they can be used for assessment of productive abilities and general linguistic ability. One popular tool for doing so is the Lexical Complexity Analyzer (LCA), but more advanced tools for parsing have become available since its creation. This paper compares a modified version of the LCA code run with NLTK and SpaCy, two popular natural language processing toolkits, and the online version of the LCA to calculate 26 measures of lexical richness. We show how similarly they calculate the measures and how well each of the three tools' calculations correlate with EFL writer's human-rated scores and TOEFL® ITP scores. We found that six of the measures suggested to be associated with higher oral proficiency by Lu (2012) were also highly correlated with higher human-rated scores and TOEFL® ITP scores in our data set. However, the modifications to our code that utilize a different list to determine word sophistication and allow be and have verbs to be treated as lexical verbs caused four measures which Lu (2012) found to be unassociated with proficiency to be correlated with both human-rated scores and TOEFL® ITP scores, particularly when run with SpaCy.

Original languageEnglish
Article number102770
JournalSystem
Volume106
DOIs
Publication statusPublished - 2022 Jun

Keywords

  • Automated assessment
  • Computer assisted evaluation
  • EFL writing
  • Lexical richness

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools'. Together they form a unique fingerprint.

Cite this