The main aim of this study is to assess the validity of measures of phraseological complexity by exploring whether theoretically similar measures of phraseological diversity and sophistication are related to each other (convergent validity) but are very different from traditional measures of lexical and syntactic complexity (discriminant validity) in a variety of learner corpora.
The construct of « phraseological complexity » originated from corpus-based research that have sought to bridge the gap between phraseological studies and L2 complexity research (cf. Paquot, 2019). It has already proved useful to describe L2 performance, assess L2 proficiency, and trace L2 development (e.g. Paquot, 2018; Paquot, 2019; Paquot et al., 2021; Rubin et al, 2019; Vandeweerd et al., in press). Recent studies have focused on dependency-based collocations (adjective + noun, adverb + verb, verb + direct object pairs) and used modification of the type-token ratio and association measures (typically pointwise mutual information; PMI) to explore two main dimensions of phraseological complexity respectively, i.e. phraseological sophistication and phraseological diversity. Few studies, however, have addressed the reliability and validity of the methods and measurements used to operationalize the construct of phraseological complexity.
In this presentation, I aim to fill this gap by exploring how consistently measures such as root type-token ratios and PMIs for amod, advmod and dobj collocations measure phraseological diversity and sophistication in a variety of learner corpora. To establish the construct validity of measures of phraseological complexity, I will make use of correlation matrices to verify (1) whether the battery of proposed measures of phraseological complexity relate to each other and produce convergent results (convergent validity), and (2) to what extent measures of phraseological complexity are different from measures of syntactic and lexical complexity (discriminant validity) (Crossley et al., 2013).