This paper presents a replication of Hu and Nation’s (2000) influential study on the 98% vocabulary coverage threshold for reading comprehension. Using a non-academic sample population, it will follow the original study design and also expand it to provide more robust insights into the validity of the 98% coverage threshold.
Hu and Nation (2000) was the first study that stipulated the critical 98% coverage figure apparently necessary for adequate text comprehension, which has since been the foundation for numerous coverage studies and has become an almost uncontested “law” in vocabulary studies (e.g. Nation, 2006; Schmitt, 2010; Nation & Webb, 2011). The 98% threshold has had considerable impact in research on coverage, word list development, FL classroom pedagogy and websites such as Lextutor’s VocabProfiler. Such a coverage threshold is also crucial for the interpretation and usefulness of prominent vocabulary tests, such as the Vocabulary Levels Test (Schmitt, Schmitt & Clapham, 2001). However, this 98% figure is based on a sample of only 66 New Zealand university students, is therefore in dire need of replication, particularly with a less WEIRD (Western, educated, industrialized, rich and democratic) sample. The study presented in this paper sought to address this by having 100 Sri Lankan non-academic adult ESL learners take the original Vocabulary Levels Test (Nation, 1983), and read four versions of a reading text, one with 100% coverage, one with 95%, one with 90% and one with 80% coverage, i.e. four texts with different levels of density of unknown words and differing amounts of nonsense words in the text, as per the design of the original study. They then answered a multiple choice and a cued written meaning recall test to measure comprehension. Scoring procedures and data analysis of the original study were adhered to (regression analyses). However, the replication study also incorporated a more recent version of the VLT that was run alongside, a fifth text condition at exactly 98% coverage, as well as re-analyses using updated frequency information and scoring principles. This exact replication and its expansion provide further and more robust insights into the validity of the 98% coverage threshold.