This contribution will present the different lexical phenomena that occur across different types of hate speech. In a second step, these insights will be extended to the field of automatic hate speech detection to discuss the role that lexical approaches can have for the development of detection systems.
When people discuss hate speech online, they often refer to very explicit verbal material, such as profanity or slurs, i.e., lexical aspects. The lexis of hate speech is also very interesting from a researcher’s point of view: Firstly, it plays indeed a major role in the rhetoric of hate speech. Secondly, it is clear that automatic detection systems rely heavily on lexical aspects. However, it is not entirely clear to what extent, particularly with less transparent multilayer neural networks. This contribution is thus going to combine both aspects to shed more light on the rhetoric of hate speech and the benefits of using a lexical approach in the detection of offensive communicative behaviour online. In the first part, we will give an overview of different lexical phenomena occurring in hate speech, from different languages (English, German, Dutch) and contexts (e.g. extremism and sexism). These include aspects more typically associated with hate speech, notably dehumanising metaphors, profanity and ideologically loaded lexis. But we will also discuss other lexical aspects that have been examined less frequently, namely, lexical creativity and coded language. As recent shared tasks on the detection of offensive language, such as GermEval 2018, have shown, up-to-date deep learning systems like convolutional neural networks usually outperform less complex systems. At the same time, simple lexical approaches constitute an important factor in boosting the results of CNNs & Co (see, for instance, Wiegand et al. 2018) and contribute to making these systems more transparent (XAI). We will demonstrate our current attempts to integrate a lexical approach into the automatic detection of hate speech, based on manually annotated words from some of the above-mentioned categories. Wiegand, Michael, Siegel, Melanie, and Ruppenhofer, Josef. 2018. Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language. In Proceedings of GermEval 2018, 1–10.