Italian Trulli

The Intellectualisation of Online Hate Speech: Monitoring the Alt-Right Audience on Youtube

Marc Tuters (M.D.Tuters@uva.nl), University of Amsterdam, Netherlands, The and Emillie de Keulenaar (edekeulenaar@gmail.com), University of Amsterdam, Netherlands, The and Ivan Kisjes (i.kisjes@uva.nl), University of Amsterdam, Netherlands, The and Daniel Bach (geden.t2@gmail.com), University of Amsterdam, Netherlands, The and Kaspar Beelen (k.beelen@uva.nl), University of Amsterdam, Netherlands, The

This paper looks at YouTube as a platform for the circulation of far-right pseudo-intellectual ideas and focuses on its alleged tendency to act as a machine for radicalization. Whilst social media platforms such as Google have promoted the notion of 'connectivity' as an unalloyed social good (Schmidt, 2013), it has recently been observed that such connectivity, mediated, for example, by recommendation systems, is equally capable of strengthening communities bound by narrow preferences for the same information, including political ideas. Its resulting so-called ‘filter bubbles’ would be creating communities bound by common aversion for users with different informational diets -- particularly political and ideological others (Zuckerberg, 2018, 3, Pariser 2011, Sunstein 2002). While a significant amount of attention has of late been focused on the general problem of 'fake news' within digital humanities research (Bournegru et al, 2017, Venturini, 2018), this abstract takes an empirically-focused approach to the broader, seemingly philosophical, problem of epistemological relativism in the landscape of contemporary social media (McIntyre, 2018; Kakutani, 2018). Specifically, it considers how it is that social media platforms, in this case, YouTube, has become a site of dissemination and development of ideas originating from far-right subcultures, particularly via the medium of pseudo-intellectual debate.

Recently referred to as the 'Dark Intellectual Web' (Weiss, 2018), YouTube is today replete with what had in the past been referred to as 'balkanized speech markets' (Sunstein, 2001, 69). These extreme speech markets, as we will refer to them here, traffic in a variety of ideas considered to be 'too controversial' to be discussed by the academic 'establishment'. While they may be understood to have deep roots in the subcultural margins of the web (notably in web-fora such as 4chan’s /pol/ "Politically Incorrect" board or Wikipedia alternates such as Metapedia 1 ), they can be seen to extend across mainstream social media platforms, and perhaps nowhere more so than on YouTube (Tokmetzis 2019). But while YouTube can justifiably censor imminent threats of physical violence, extreme speech -- hitherto marginal -- rarely fits the platforms’ actionable descriptions of ‘hateful’ content (YouTube, 2018a, 2018b). This thus leaves watchdogs and authorities to doubt where, exactly, to place the distinction between expressions of ideological extremisms -- which have been further normalised into popular content, and which many platforms have sought to tolerate in deference to American free speech principles -- and simply illegal expressions of ethnic, religious and sexual hatred. While the latter discourse might perhaps be ignored if it remained on fringe websites, extremism experts and media scholars have argued that the design of YouTube's recommender system may provide users with the affordances to develop these ideas into inconspicuous intellectual expressions — making Youtube an accidental machine for normalising political extremes within closely-knit political communities (Holt, 2017; Tufekci, 2018).

The problem in question, then, is at the intersection of computational linguistics and intellectual history. As hate speech has largely been defined as products or by-products of far-right political thought, and as content associated to this political culture has been largely normalised as popular content, to what extent would hate speech detection equate ideological persecution or censorship? 2 Would it not be inadvertently aggravating contentions between polarised communities? Instead, how can hate speech detection be sensible to the intellectual history and transformation of ideas substantiating ‘traditional’ hate speech? Discussions of ideas traditional to far-right political thought is widespread enough on Youtube to, as this paper will assess, call for computational techniques sensible to the ambiguity of such “online vitriol” (Pohjonen and Udupa, 2017, 1173).

In this respect, this paper adopts an approach that might be considered a variation of reception studies traditions in studying extreme speech on YouTube. Our approach is a response to Lewis’s analysis of far-right political culture on YouTube: it seeks to follow the audience as opposed to profiling the uploaders of content — the latter whom Lewis referred to as the “Alternative Influence Network” (Lewis, 2018). While in dialogue with Lewis’s research, we take issue with her use of the concept of “influence” — informed here by a critique from reception studies of the media effects tradition. With this point of distinction in mind, our paper looks at how we can detect

hate speech terms uttered by channels and commenters on Youtube, as defined by a number of international hate speech watchdog datasets, by associating them to intellectual corpora having substantiated their discriminatory viewpoints. While racially discriminatory slangs -- to name the deeply inflammatory term, ‘nigger’ -- may not be detectable on YouTube comments and video transcripts, terms originating from nineteenth-century biological racism -- ‘negroid’ -- are instead abundant. We consider how and whether such terms appear in channels and commenters presenting known for debating their arguments in such pseudo-academic format. We then attempt to understand what happens when commenters who have previously been using terms associated to ‘hate speech’ come into contact with intellectualised extreme speech markets.

It is worth mentioning that YouTube has often been characterized as an important hub and source of intellectual resources for new far-right users and talking heads (Holt, 2017; Tufekci, 2018; Weiss, 2018; Lewis, 2018). Self-described 'sceptics' and 'freethinkers' engage in discussions of topics such as ‘race and IQ’ or the role of Jewish elites in the ‘new world order’ as matters of genuine intellectual concern. While the substance of these discussions appears as hateful or conspiratorial to both experts and laymen users, their participants tend to see their points of view as alternative, both ideologically and intellectually.

Within the digital humanities, there is a growing interest in the automatic identification of political ideology and ideas, often relying on techniques created within the field of natural language processing (Azarbonyad et al. 2017; Kenter et al. 2015; Iyyer et al., 2014). The latter discipline has tended to focus on the detection of hate speech as though it were a category distinguishable from “normal” or “acceptable” speech. Following Pohjonen and Udupa’s critique of this “binary” conception of hate speech, our approach is concerned with a broader range of equally problematic forms of speech that do not necessarily constitute hate speech. With this problem in mind, the abstract explores text mining techniques to monitor attitudes of users over time. This yields a computational lens on the behavioural effects of the platform, i.e., how the beliefs and behaviours of users shift as a result of the content they engage with. Our approach of following the audience thus differs from most YouTube research which centred on content creators and largely relies on standard metrics made available through the API (such as channel content, related videos, etc. — see Rieder, 2018). We thus map the debating culture of the new right on YouTube as a process resulting (in part) from the interaction of commenters and channels. By measuring the word usage of the audience of what has elsewhere be called the “Alternative Influence Network” or the “Dark Intellectual Web”, our approach may thus be understood as mapping the debating culture of the new right on YouTube from below in order to determine how audiences become associated to the cause.

Methodologically, we have developed quantitative techniques for gathering all the comments made by individual users (whose identities have been anonymized) in order to map their movement through and across YouTube channels. In collaboration with journalist and extremism expert Dimitri Tokmetzis (Tokmetzis, 2018), we have processed 1027 right-wing YouTube channels initially harvested using the YouTube Data Tool (Rieder, 2018).

The data was collected based on a list of actors that extracted from a database created by the Dutch right-wing watchdog organization KAFKA. The resulting set has a broad an international (European) focus, though the majority of actors are Anglo-American.

The data retrieved via the YouTube API record consists of video titles, channel references, tags and comments. To analyse this content -- and establish the extent to which users agreed with extreme opinions -- we defined two vocabularies: one is a set of intellectual ‘extreme’ notions originating from two, main sources. One source comes from tags in right-wing videos and commenters, e.g. 'eugenics', ‘race and IQ’ and ‘white identity’. The second source is a collection of frequently mentioned terms from Metapedia’s article on race and intelligence. The second vocabulary is made of 267 common English language comments that indicated a commenter’s consensus with a video (including comments such as 'well said', 'agreed' and 'brilliant'). Using such comments, we identified those commenters agreeing with videos tagged as ‘extreme speech’. We then identified the five preceding comments by those users, which we visualised in a directed network that indicates where and when users have commented in videos from our selected channels. To find movements between hate and extreme speech, we have developed a similar list of keywords to identify ‘hate speech’ comments. 3 Applying these to comments allows us to compute ‘extreme’ and ‘hate speech’ scores for users over time. Comparing movements over time in terms of both ‘hate speech’ and ‘extreme speech’ will provide us with the data to test the hypothesis of whether commenters move from ‘hate speech’ to ‘extreme speech’.

Figure 1: Scientific racism and hate speech in right-wing comments and transcripts, 2006-2018

At the crux of the problem then is the attempt to define what counts and does not count as hate speech — which is itself a concept that has come under considerable criticism from the subject being studied in this case study. Our objective is thus to bring critical perspectives to those digital spaces where users consume and reproduce intellectual content. More specifically, our objective is to monitor pseudo-intellectual debates in the comment section and explain them in terms of user behaviour. It is thus our hope that this approach will help researchers concerned with hate speech to contextualise concepts having to do with various intellectual processes online, including radicalization, indoctrination, and (partisan) education.

Appendix A

Bibliography
  1. Azarbonyad, Hosein, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, Maarten Marx, and Jaap Kamps. "Words are malleable: Computing semantic shifts in political and media discourse." In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1509-1518. ACM, 2017.
  2. Bounegru, Liliana, Jonathan Gray, Tommaso Venturini, and Michele Mauri. 2017. “Field Guide to Fake News: a Collection of Recipes for Those Who Love to Cook with Digital Methods.” Public Data Lab. https://fakenews.publicdatalab.org
  3. Holt, Jared. October 2017. ‘White Supremacy Figured Out How to Become YouTube Famous’, Right Wing Watch http://www.rightwingwatch.org/report/white-supremacy-figured-out-how-to-become-youtube-famous/
  4. Iyyer, Mohit, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. "Political ideology detection using recursive neural networks." In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1113-1122. 2014.
  5. Kenter, Tom, Melvin Wevers, Pim Huijnen, and Maarten de Rijke. "Ad hoc monitoring of vocabulary shifts over time." In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1191-1200. ACM, 2015.
  6. Lewis, Rebecca. 2018. Alternative Influence: Broadcasting the Reactionary Right on YouTube. New York: Data & Society 7
  7. McIntyre, Lee. 2018. Post-Truth. Cambridge: MIT Press
  8. Kakutani, Michiko. 2018. The Death of Truth. New York: Harper Collins.
  9. Pohjonen, Matti, and Sahana Udupa. 2017. 'Extreme Speech Online: an Anthropological Critique of Hate Speech Debates.' International Journal of Communication 11 (March): 1173–91.
  10. Rieder, Bernhard. 2018. YouTube Data Tools. Computer software. Vers. 1.09. N.p., 10 October 2018. Web
  11. Schmidt, Eric, and Jared Cohen. 2013. The New Digital Age: Transforming Nations, Businesses, and Our Lives. New York: Vintage.
  12. Sunstein, Cass. 2001. Republic.com. Princeton: Princeton University Press.
  13. Tokmetzis, Dimitri, October 31 2018. “Zo onderzoeken we de extremistische informatiebubbels op YouTube”. De Correspondent https://decorrespondent.nl/8795/zo-onderzoeken-we-de-extremistische-informatiebubbels-op-youtube/428290115-40578627
  14. Tufekci, Zeynep. March 10 2018. YouTube, the Great Radicalizer https://www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical.html
  15. Weiss, Ben. May 8 2018. 'Opinion | Meet the Renegades of the Intellectual Dark Web.' New York Times https://www.nytimes.com/2018/05/08/opinion/intellectual-dark-web.html
  16. YouTube (2018a) Policies - YouTube, YouTube. Available at: https://www.youtube.com/yt/about/policies/#community-guidelines(Accessed: 31 October 2018).
  17. YouTube (2018b) Terms of Service - YouTube, YouTube. Available at: https://www.youtube.com/static?gl=CA&template=terms(Accessed: 31 October 2018).
  18. Zuckerberg, Donna. 2018. Not All Dead White Men: Classics and Misogyny in the Digital Age. Cambridge: Harvard University Press
Notes
2.

The problem of censorship poses theoretical and methodological challenges. Social Media platforms, and their comment space in particular are notorious for their opaque censorship policies. Based on our preliminary results, however, we observed that videos are more rigorously monitored than comments. In a related study on the vaccine debate, we noted that extreme content, promoting the anti-vaxxers point-of-view, was more likely to be removed than pro-vaccines channels.

3.

We define hate speech based on a corpus of terms as collected by the hatebase.org, which bills itself as the world's largest structured repository of regionalized, multilingual hate speech, and which has been compiled by a recognized Canadian NGO. We manually selected items from that database to make up the list.