TY - JOUR
T1 - The possibility of improving automated calculation of measures of lexical richness for EFL writing
T2 - A comparison of the LCA, NLTK and SpaCy tools
AU - Spring, Ryan
AU - Johnson, Matthew
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/6
Y1 - 2022/6
N2 - Automatically calculating measures of lexical richness is important for L2 learning because they can be used for assessment of productive abilities and general linguistic ability. One popular tool for doing so is the Lexical Complexity Analyzer (LCA), but more advanced tools for parsing have become available since its creation. This paper compares a modified version of the LCA code run with NLTK and SpaCy, two popular natural language processing toolkits, and the online version of the LCA to calculate 26 measures of lexical richness. We show how similarly they calculate the measures and how well each of the three tools' calculations correlate with EFL writer's human-rated scores and TOEFL® ITP scores. We found that six of the measures suggested to be associated with higher oral proficiency by Lu (2012) were also highly correlated with higher human-rated scores and TOEFL® ITP scores in our data set. However, the modifications to our code that utilize a different list to determine word sophistication and allow be and have verbs to be treated as lexical verbs caused four measures which Lu (2012) found to be unassociated with proficiency to be correlated with both human-rated scores and TOEFL® ITP scores, particularly when run with SpaCy.
AB - Automatically calculating measures of lexical richness is important for L2 learning because they can be used for assessment of productive abilities and general linguistic ability. One popular tool for doing so is the Lexical Complexity Analyzer (LCA), but more advanced tools for parsing have become available since its creation. This paper compares a modified version of the LCA code run with NLTK and SpaCy, two popular natural language processing toolkits, and the online version of the LCA to calculate 26 measures of lexical richness. We show how similarly they calculate the measures and how well each of the three tools' calculations correlate with EFL writer's human-rated scores and TOEFL® ITP scores. We found that six of the measures suggested to be associated with higher oral proficiency by Lu (2012) were also highly correlated with higher human-rated scores and TOEFL® ITP scores in our data set. However, the modifications to our code that utilize a different list to determine word sophistication and allow be and have verbs to be treated as lexical verbs caused four measures which Lu (2012) found to be unassociated with proficiency to be correlated with both human-rated scores and TOEFL® ITP scores, particularly when run with SpaCy.
KW - Automated assessment
KW - Computer assisted evaluation
KW - EFL writing
KW - Lexical richness
UR - http://www.scopus.com/inward/record.url?scp=85126841386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126841386&partnerID=8YFLogxK
U2 - 10.1016/j.system.2022.102770
DO - 10.1016/j.system.2022.102770
M3 - Article
AN - SCOPUS:85126841386
SN - 0346-251X
VL - 106
JO - System
JF - System
M1 - 102770
ER -