Reporting examples of abusive content
The nature of researching online abusive content means that researchers often view and analyse hateful, harmful and otherwise unpleasant content. In some cases, it benefits the research for such content to appear in their outputs (e.g. papers and presentations). However, this raises ethical challenges which need to be considered by all researchers. This policy aims to address the uncensored use of explicit language in published outputs of the workshop. We introduce this policy for two reasons: (1) readers who do not want to read abusive language should not be excluded from engaging in abusive language research because such language is needlessly shown in research outputs and (2) in most cases, the quality of research is not diminished by obfuscating or removing abusive examples.
Publishing real vs. synthetic content
In general, we recommend that authors do not publish any social media or online content verbatim but, instead, provide synthetic posts which retain the key aspects of the original (such as the syntax, lexicon, degree of aggression and semantic meaning). Using synthetic posts means that the original users cannot be identified - which is important for maintaining their privacy given the sensitive nature of this research area. If consent to publish posts is given by the users who generated the content, this requirement could be waived - but even in such circumstances we encourage researchers to think about the potential for harm and whether the original post needs to be shared. In most cases, a well-designed synthetic post will convey the same meaning. However, one risk is that researchers could, intentionally or unintentionally, construct posts which better support their arguments and findings. We advise that researchers seek advice from colleagues in such situations to sense-check their synthetic posts. These issues are discussed further in Williams et al. (2017).
Publishing slurs and offensive content
Publishing slurs and offensive content risks inflicting harm on people who read your publication and potentially could even create vicarious trauma (see Vidgen et al. 2019, online appendix). In order to balance faithful academic communication with not reproducing harm, we recommend:
Always present an offensive content warning, which readers can see before reading any offensive and/or distressing content. For example:
OFFENSIVE CONTENT WARNING: This report contains some examples of hateful content. This is strictly for the purposes of enabling this research, and we have sought to minimize the number of examples where possible. Please be aware that this content could be offensive and cause you distress.
All slurs and swear words should obfuscated by using * to represent middle letters, such as: ‘N****r, ‘B***h’, ‘C**t’, ‘F**k’ and ‘C*ck’. Authors should use their discretion when deciding which terms to obfuscate - the original term still needs to be discernible.
Researchers should not include an excessive amount of examples, particularly if presenting new dictionaries or keyword lists. Whilst using some examples enhances the publication and improves academic communication, an excessive number can be gratuitous.
Williams et al. (2017), ‘Towards an ethical framework for publishing Twitter data in social research’, Sociology, available at: https://journals.sagepub.com/doi/10.1177/0038038517708140
Vidgen et al. (2019), ‘Challenges and frontiers in abusive content detection’, online appendix, available at: https://github.com/bvidgen/Challenges-and-frontiers-in-abusive-content-detection