Automatizing L2 fluency measurement: validity and developmental sensitivity of temporal fluency metrics variations

This submission has open access

Abstract Summary

Automatizing testing and computation of multiple variations (pruning, normalization...) of L2 utterance fluency metrics, we collected pre/post oral interview responses from N=215 learners of French, and compared each metric's variation with external proficiency estimates, to determine which operationalizations best predict proficiency and detect very-short-term developmental changes in L2 fluency.

Submission ID :

AILA974

Submission Type

Standard

Abstract :

Speaking utterance fluency, as a dimension of L2 performance, is assumed to be correlated to L2 proficiency, and the ability to measure it objectively and precisely is key for testing and research. Many utterance fluency metrics have been proposed, compared, and validated in terms of how well they discriminate or predict proficiency levels, allow to measure short-term L2 development or correlate with perceived fluency (e.g., Segalowitz et al, 2017; Tavakoli et al, 2019). However, the precise operationalization of these fluency measurements is rarely discussed in detail and often diverges among studies (Dumont, 2018). While some issues, such as the silent pause threshold, have been studied in more detail (de Jong & Bosker, 2013), others, such as pruning, have rarely been discussed in depth.

The present study attempts to (semi-)automatize the testing and the computation of multiple variations of L2 fluency metrics, to compare how well they predict external proficiency estimates, including within a limited proficiency range, and how sensitive they are to very-short-term developmental changes.

We used a computer-delivered oral interview to record 215 young low-intermediate learners of French in a pre- and a posttest separated by 1-3 weeks and, for the experimental group, a short pedagogical intervention based on interactions in a dialogue-based computer-assisted language learning game. The resulting 12'000 audio files were transcribed by automatic speech recognition, manually corrected, and annotated for a series of "disfluencies". We computed both signal-based (e.g., via de Jong & Wempe 2009) and transcription-based fluency metrics, in as many variations as possible in terms of pruning (e.g., do L1-words count? proper nouns? self-talk?) and normalizations (words, syllables, silent pauses...).

We evaluate how well each metric's variations correlate with external proficiency estimates, including a vocabulary size test, and are able to detect changes in such a short timeframe, and how reliable the fully automated metrics are.

Pre-recorded video :

View Attachment

If the file does not load, click here to open/download the file.

Handouts :

View Attachment

If the file does not load, click here to open/download the file.