Tens or thousands of publications make it difficult to really know a scientific field in depth. Taking the case of data-driven learning, this paper presents an overview of different types of synthesis, then moves on to a new overview of historical developments, with implications for best practice.
Research fields in applied linguistics evolve over time as argumentation and personal experience are augmented by research evidence, and the body of publications can reach the hundreds or thousands of papers. This is the case for data-driven learning, i.e. "using the tools and techniques of corpus linguistics for pedagogical purposes", where the output of empirical studies alone now numbers dozens of publications each year. The questions facing researchers in this field are how to keep track of events, and how to make sense of it all. This paper presents an overview of different types of DDL syntheses to date. Traditionally, any paper will feature a state-of-the-art literature review, typically with fewer than 30 references selected to support the paper and/or through ignorance of less well-known publications. The starting point for any synthesis is a rigorous and near-exhaustive collection of publications which can then be coded and used as a representative data-base for synthesis. Narrative syntheses allow in-depth interpretation and contextualisation of all publication types, but remain open to allegations of subjectivity. A principled coding scheme can mitigate this, complemented by various tools for the analysis of the texts themselves. Meta-analyses pool quantitative results to date, but cannot account for qualitative studies, and may not be as objective as one might expect. While these represent the main types of synthesis, others are also possible; we conclude with one such, a historical overview of DDL publications in English to chart developments over time, revealing strengths and weaknesses and pointing to future areas of research.