Based on a specialized research corpus of articles mentioning language from the period between 1980 and 2018 (ca. 29k, 40 million words), this pilot study tests the potential of topic modeling for the identification of large-scale patterns in (language/related) discourses and (language) ideologies with respect to multilingualism and immigration, as well as any identifiable relevant subaltern discourses. The large-scale patterns thus identified are further examined for evidence of nationalist conceptions of language as a principal criterion for the determination of national identity and thus national belongingness.
Recent lexical approaches to the identification of (language/related) discourses and (language) ideologies in particular, focus on the application of quantitative corpus-linguistic techniques to large data sets as a way to ensure more objective sampling methods and replicability of analytical procedures, as well as to minimize researcher inference (e.g., Subtirelu, 2015; Vessey, 2015). In addition to applications of exploratory factor analysis to corpus data (e.g., Ajšić, 2015, 2021a, b; Fitzsimmons-Doolan, 2014), discourse researchers increasingly rely on a similar multivariate statistical technique called topic modeling (e.g., Brookes & McEnery, 2019; DiMaggio, Nag & Blei, 2013; Murakami, Thompson, Hunston & Vajn, 2017; Törnberg & Törnberg, 2016). Based on a specialized research corpus of articles mentioning language published in the period between 1980 and 2018 (ca. 29k, 40 million words), this pilot study tests the potential of topic modeling for the identification of large-scale patterns in (language/related) discourses and (language) ideologies with respect to multilingualism and immigration. The large-scale patterns thus identified are further examined for evidence of nationalist conceptions of language as a principal criterion for the determination of national identity and thus national belongingness, again particularly with respect to multilingualism and immigration, as well as any identifiable subaltern discourses and ideologies challenging these conceptions.