What automatic measurement of text similarity tells us: Development of second language learner writing

This submission has open access
Abstract Summary

We measured how L2 writing is similar to L1 writing with respect to lexical/semantic quality through topic modelling (Word2Vec). Results indicate that the quality of L2 writing approximates to that of L1 writing as proficiency increases. Writing prompts were found to affect little in this progress.

Submission ID :
AILA396
Submission Type
Abstract :

This study explores how second language (L2) learner writing develops as proficiency increases with respect to the degree of text similarity relative to native speaker writing. Here we adopt topic modelling, a Natural Language Processing approach to detecting hidden topics in a large volume of texts in an unsupervised way. For this purpose, 36 Chinese-speaking L2 learners of Korean, along with those from 10 native speakers of Korean as control, were asked to write an argumentative essay with two topics. Proficiency levels amongst L2 participants were evenly distributed: low (n=11), intermediate (n=13), and high (n=12). All the essays were converted electronically with spelling error uncorrected. Lexical/semantic similarity between native speaker writing and L2 writing was calculated by using Word2Vec through Gensim (Rehurek & Sojka, 2010). Overall, results showed improvement of the similarity scores as proficiency increased, which indicates that the lexical/semantic quality of L2 writing becomes similar to that of L1 writing in proportion to learner proficiency. An additional analysis with each prompt excluded showed that these prompts influenced little on the scores in general, and of the three proficiency groups, the low-proficiency group was affected the most. This implies that low-proficiency learners may have relied more on the given prompts in writing than the other learner groups. Taken together, our findings suggest the need to provide knowledge about particular topics, in conjunction with lexical items representative of these topics, as input in order to enhance the acquisition process of target language knowledge.


Topic 1: Which is important, protection or exploitation of nature?

            Overall        Excluding the prompt

NSK~high        0.763        0.751

NSK~intermediate    0.743        0.742

NSK~low        0.708        0.671


Topic 2: Which is helpful for success, cooperation or competition?

            Overall        Excluding the prompt

NSK~high        0.764        0.760

NSK~intermediate    0.737        0.728

NSK~low        0.696        0.669


Note. NSK = native speakers of Korean

Pre-recorded video :
If the file does not load, click here to open/download the file.
Handouts :
If the file does not load, click here to open/download the file.
Assistant Professor
,
Palacky University Olomouc
University of Pittsburgh

Abstracts With Same Type

Submission ID
Submission Title
Submission Topic
Submission Type
Primary Author
AILA1060
AILA Symposium
Standard
Dr. Yo-An Lee
88 visits