This paper presents validity arguments for the automated assessment of English academic proficiency by discussing the case of a fully computerized and automatically scored test of academic English. Evidence of the validity of construct definition will be presented, drawing from research on second language acquisition, assessment, and related fields.
Admission to University in English-speaking countries generally requires students to submit a certified proof of English proficiency such as IELTS, TOEFL, or PTE Academic. As a result of advancements in natural language processing technologies, models of automated language assessment have increasingly developed and transformed the practice of testing (Xi, 2010). Automated assessment has brought about several advantages, for example in terms of consistency of measurement and fairness, but also opened a debate on whether technology changes the nature of the construct being measured. The process of establishing construct validity for any test should be an on-going process in which multiple sources of evidence are collected. Therefore, this paper presents validity arguments for the automated assessment of English academic proficiency by discussing the case of a fully computerized and automatically scored test of academic English. In the first section of the paper, we will introduce the test to the audience and describe the language performance which is elicited by the different item types assessing communicative skills independently or in an integrated way. The second section of the paper will look more into detail at how test-takers are evaluated in terms of linguistic, sociolinguistic, discourse, and functional competencies and discuss the arguments provided for score interpretation. The two sections will focus on providing evidence of the validity of the inferences about the different components of the construct definition, drawing from research on second language acquisition, assessment, and related fields. This paper contributes to a greater understanding of the challenges introduced by the use of technology for assessing English academic proficiency and discusses the pieces of evidence which need to be collected to support the validity claims of automated scoring systems.