Shared Task on Hateful Memes at WOAH 2021

Detecting hateful content with AI is difficult -- and it’s even more difficult when the content is multimodal, such as a meme. Memes can be understood by humans because we do not think about the words and photos independently but, instead, combine the two together. In contrast, most AI systems analyze text and image separately and do not learn a joint representation. This is both inefficient and flawed, and such systems are likely to fail when a non-hateful image is combined with non-hateful text to produce content that is nonetheless still hateful. For AI to detect this sort of hate it must learn to understand content the way that people do: holistically.


To accelerate research on multimodal understanding and detection of hate speech, Facebook AI created the hateful memes challenge in 2020, and released a dataset containing 10,000+ annotated memes. We now present this dataset for the WOAH 5 Shared task with additional newly created fine-grained labels for the protected category that has been attacked (e.g., women, black people, immigrants) as well as the type of attack (e.g., inciting violence, dehumanizing, mocking the group).

Tasks

Task A (multi-label): For each meme, detect the protected category. Protected categories are: race, disability, religion, nationality, sex. If the meme is not_hateful the protected category is: pc_empty.


Task B (multi-label): For each meme, detect the attack type. Attack types are: Attack types are: contempt, mocking, inferiority, slurs, exclusion, dehumanizing, inciting_violence. If the meme is not_hateful the protected category is: attack_empty.


Tasks A and B are multi-label because memes can contain attacks against multiple protected categories and can involve multiple attack types.

Important dates

  • March 19th: Shared task data is available. Go to the competition page on DrivenData for the memes dataset (see detailed instructions below) and our Github page for the fine-grained annotations.

  • March 25th: MMF setup for getting started, with initial baselines and pre-trained models released

  • May 28th 23:59 (AOE): Predictions due

  • May 31st, 23:59 (AOE): Shared task paper submissions due

  • May 8th: Notifications

  • June 21st, 23:59 (AOE): Camera-ready papers due

  • August 5th - 6th: Workshop day!


Input format

Information about each meme is presented in JSON. Memes can be described as:

  • img: Relative path of the raw image (.png file)

  • text: Extracted text from the meme

  • set_name: Data partition, indicating training and development splits

  • pc: Protected category annotations, with annotations from up to 3 annotators

  • gold_pc: Gold standard labels for protected categories used in Task B, based on majority voting

  • attacks: Attack type annotations, with annotations from up to 3 annotators

  • gold_attack: Gold standard labels for attack types used in Task C, based on majority voting

  • gold_hate: Gold standard labels for hateful or not used in Task A

  • id: Unique identifier for each entry

Submission format

The output file should have the following record structure, which simply adds an additional field to the record.

  • pred_hate: Dictionary of {label:score} for the hate classification task

  • pred_attack: Dictionary of {label:score} for the attack category task

  • pred_pc: Dictionary of {label:score} for the protected category task

  • set_name: The partition for which the predictions are being computed

  • id: Unique identifier of the record

Evaluation

Entries for all three tasks are evaluated using AUROC, implemented using the standard roc_auc metric provided in sklearn library. The evaluation scripts, example predictions, and how to use the scoring script will be made available shortly.

How to participate

Join woah2021task@googlegroups.com. Your request should include the first name, last name, and affiliation of all team members.

Get the original hateful memes dataset here from DrivenData. To access the data:

  1. Register for DrivenData

  2. Find the competition: https://www.drivendata.org/competitions/64/hateful-memes/

  3. Join the competition and e-sign the ‘Data Access Agreement’

  4. Then download the data from the ‘Data Download’ option


You can access the fine-grained annotations from github.


Rules

  1. You must submit your code with your predictions, and make it available open source.

  2. You cannot hand label any of the entries or manually assign them scores.

  3. You should treat the test set examples as independent.

  4. Your system should predict protected category and attacks over the entire dataset, for non-hateful the model should be able to predict pc_empty and attack_empty

  5. If you do not adhere to the spirit of the competition rules then your entry will be rejected.

What is hate speech?

Within this shared task, hate speech is defined as a direct attack against people on the basis of protected characteristics, such as race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, socio-economic status and serious disease. It includes violent or dehumanizing speech, harmful stereotypes, statements of inferiority, expressions of contempt, disgust or dismissal, cursing, and calls for exclusion or segregation.

Shared task organizers


Appendix

To help clarify the categories for Task A and Task B we provide definitions and examples of all classes. The examples are synthetic.

In all cases, we have used a * to remove the first character in hateful terms (e.g., slurs). Readers are likely to find the statements offensive.

Task A - Protected characteristics

Task B - Attack vectors