We measured how L2 writing is similar to L1 writing with respect to lexical/semantic quality through topic modelling (Word2Vec). Results indicate that the quality of L2 writing approximates to that of L1 writing as proficiency increases. Writing prompts were found to affect little in this progress.
This study explores how second language (L2) learner writing develops as proficiency increases with respect to the degree of text similarity relative to native speaker writing. Here we adopt topic modelling, a Natural Language Processing approach to detecting hidden topics in a large volume of texts in an unsupervised way. For this purpose, 36 Chinese-speaking L2 learners of Korean, along with those from 10 native speakers of Korean as control, were asked to write an argumentative essay with two topics. Proficiency levels amongst L2 participants were evenly distributed: low (n=11), intermediate (n=13), and high (n=12). All the essays were converted electronically with spelling error uncorrected. Lexical/semantic similarity between native speaker writing and L2 writing was calculated by using Word2Vec through Gensim (Rehurek & Sojka, 2010). Overall, results showed improvement of the similarity scores as proficiency increased, which indicates that the lexical/semantic quality of L2 writing becomes similar to that of L1 writing in proportion to learner proficiency. An additional analysis with each prompt excluded showed that these prompts influenced little on the scores in general, and of the three proficiency groups, the low-proficiency group was affected the most. This implies that low-proficiency learners may have relied more on the given prompts in writing than the other learner groups. Taken together, our findings suggest the need to provide knowledge about particular topics, in conjunction with lexical items representative of these topics, as input in order to enhance the acquisition process of target language knowledge.
Topic 1: Which is important, protection or exploitation of nature?
Overall Excluding the prompt
NSK~high 0.763 0.751
NSK~intermediate 0.743 0.742
NSK~low 0.708 0.671
Topic 2: Which is helpful for success, cooperation or competition?
NSK~high 0.764 0.760
NSK~intermediate 0.737 0.728
NSK~low 0.696 0.669
Note. NSK = native speakers of Korean