Can graph machine learning identify hate speech in online social networks?

Online hate speech is a complex subject. Follow this demonstration using state-of-the-art graph neural network models to detect hateful users based on their activities on the Twitter social network.

By Pantelis Elinas, Anna Leontjeva, and Yuriy Tyshetskiy.

Over three decades, the Internet has grown from a small network of computers used by research scientists to communicate and exchange data to a technology that has penetrated almost every aspect of our day-to-day lives. Today, it is hard to imagine a life without online access for business, shopping, and socialising.

A technology that has connected humanity at a scale never before possible has also amplified some of our worst qualities. Online hate speech spreads virally across the globe with short- and long-term consequences for individuals and societies. These consequences are often difficult to measure and predict. Online social media websites and mobile apps have inadvertently become the platform for the spread and proliferation of hate speech.

What is online hate speech?

“Hate speech is a type of speech that takes place online (e.g., the Internet, online social media platforms) with the purpose to attack a person or a group on the basis of attributes such as race, religion, ethnic origin, sexual orientation, disability, or gender.” [source]

A number of international institutions, including the UN Human Rights Council and the Online Hate Prevention Institute are engaged in understanding the nature, proliferation, and prevention of online hate speech. Recent advances in machine learning have shown promising results to aid in these efforts, especially as a scalable automated system for early detection and prevention. Academic researchers are constantly improving machine learning systems for hate speech classification. Simultaneously, all major social media networks are deploying and constantly fine-tuning similar tools and systems.

Online hate speech is a complex subject. In this article, we consider using machine learning to detect hateful users based on their activities on the Twitter social network. The problem and dataset were first published in [1]. The data are freely available for download from Kaggle here.

In what follows, we develop and compare two machine learning methods for classifying a small subset of Twitter’s users as hateful or normal (not hateful). First, we employ a traditional machine learning method to train a classifier based on users’ lexicons and social profiles. Next, we apply a state-of-the-art graph neural network (GNN) machine learning algorithm to solve the same problem, but now also considering the relationships between users.

If you wish to follow along, the Python code in a Jupyter Notebook can be found here.


We demonstrate applying machine learning for online hate speech detection using a dataset of Twitter users and their activities on the social media network. The dataset was originally published by researchers from Universidade Federal de Minas Gerais, Brazil [1], and we use it without modification.

The data covers 100,368 Twitter users. For each user, a number of activity-related features is available. Such features include the frequency of tweeting, the number of followers, the number of favourites, and the number of hashtags. Furthermore, an analysis of each user’s lexicon derived from their last 200 tweets yielded a large number of features with regards to language content. Stanford’s Empath tool [2], was used to analyse each user’s lexicon with regards to categories such as love, violence, community, warmth, ridicule, independence, envy, and politics, and assign numeric values indicating the user’s alignment with each category.

In total, we use 204 features to characterise each user in the dataset. For each user, we collect these features to form a 204-dimensional feature vector to be used as input to a machine learning model for classifying users as hateful or normal.

The dataset also includes relationships between the users. A user is considered connected with another user if the former has re-tweeted the latter. This relationship structure gives rise to a network which is different from Twitter’s network of follower and followee relationships. Followees are hidden from us since users can elect to keep their network private, while the retweet network remains public so long as the original tweets are public.

The relative proportions of annotated users in the dataset where red are hateful, green are normal, and blue are other. Few of the users are known to be hateful or not.

Finally, users are labelled as belonging to one of three categories: hateful, normal, or other. Out of ~100k (we use the symbol ~ to denote approximate numbers) users in the dataset, only ~5k have been manually annotated as hateful or normal; the remaining ~95k users belong to the other category, meaning they haven’t been annotated. The relative proportions of annotated users in the dataset are shown on the left where red is hateful, green is normal, and blue are other. Few of the users are known to be hateful or not. The authors in [1] describe in more detail the protocol guiding the data annotation process.

Figure 1 shows a graphical representation of the dataset. We show users annotated as hateful in red circles, whereas we show users annotated as normal in green circles. Users labelled as other (not annotated) are left blank.

Figure 1: The hateful Twitter dataset structure and basic statistics.

Hateful user classification using Machine Learning

Our objective is to train a binary classification model that can be used to classify users as hateful or normal. However, the dataset used to train the model presents two challenges.

Firstly, only a small subset of users is annotated as hateful or normal, with majority of users’ labels unknown (other category). Secondly, the labelled data is highly imbalanced in terms of label distribution: out of the ~5k annotated users, only ~500 (~10%) have been annotated as hateful and the remaining as normal.

Semi-supervised machine learning methods can help us alleviate the small labelled sample issue by making use of both labelled and unlabelled data. We will consider such methods later in this article within the context of GNNs.

To deal with class imbalance in the labelled training set, we calculate and use class weights; these weights are used in the model’s loss function (that is optimised during model training) to penalise the model’s mistakes in classifying users from the “minority” class (the class with fewer examples, or in this case the hateful class of users) proportionally more than mistakes in classifying users from the “majority” class (the normal class of users).

Splitting the data into training and test sets

We split this data into a training and a test sets using stratified sampling such that 15% of the annotated user data is selected for training, and the remaining 85% for testing the trained classification model.

The statistics of our train and test sets are as follows:

Train normal: 664, hateful: 81

Test normal: 3,763, hateful: 463

The training dataset exhibits high-class imbalance. We can compensate for this imbalance by using class weights such that more emphasis is given to the underrepresented class when the loss function is evaluated during model training. We calculated class weights normal: 0.56 and hateful: 4.60; that is, the positive class will be given ~8 times the weight in calculating the loss function.

Evaluation metrics

In order to evaluate the performance and compare the trained classification models, we are going to consider the following three metrics (for a description of evaluation metrics for binary classifiers see here), evaluated on the held-out test set of annotated users:

  1. Accuracy
  2. Receiver Operating Characteristic (ROC) curve
  3. Area Under the ROC curve (AU-ROC)

Logistic regression model

Let’s begin by training a logistic regression (LR) model to predict a normal or hateful label for a user.

When training and evaluating this model, we will ignore users that are not annotated as either normal or hateful as well as the relationships between users due to lack of direct support for such information in LR.

The training and test data are structured in tabular format, as shown in Figure 2. The annotated users in the training set are shown using red and green circles. The feature vectors for each of the users are stacked vertically to create the design matrix input to the LR model. After training the LR model, we can make predictions for the users in the held-out test set in order to measure the generalisation performance of the trained model.

Figure 2: The setup for training and evaluating a logistic regression model for online hate speech classification.

After training the model, we can use it to make a prediction for each user in the test set and calculate the accuracy and AU-ROC metrics as well as plot the ROC curve. The accuracy on the test set is 85.9%, and the AU-ROC is 0.81. A plot of the ROC curve can be seen in the below figure.

Figure 3: The ROC curve for the logistic regression classifier calculated using the test data. The area under the ROC curve is 0.81.

Graph Neural Networks

In the specification and training of the LR model, we ignored the ~95k users that have not been annotated as hateful or normal. Furthermore, we ignored the relationships between users.

It is conceivable that a hateful user would take measures to avoid easy identification by, for example, being cautious not to use obviously hateful vocabulary. However, the same user might be comfortable retweeting other users’ hateful tweets. This information is hidden in the relationships between users. The LR model we employed above did not utilise these relationships.

This leads us to ask; can we exploit the relationships between users, as well as the data about non-annotated users to improve the predictive performance of a machine learning model? And if so, what kind of machine learning model can we use, and how?

One way to use the relationship data is to do manual feature engineering, introducing network-related features into the LR model. Examples of such features include various centrality measures that quantify the positional importance of nodes in the graph. In fact, the dataset as published in [1] includes such engineered network-related features, but we have deliberately removed them from the data in order to demonstrate one of the core ideas in modern machine learning. This idea popularised by deep learning methods is that it is possible to let the machine learning algorithm automatically learn suitable features that maximise model performance, thus avoiding the laborious process of manual feature engineering. (This automation of feature engineering, however, comes at a price of interpretability of the resulting model — a subject of another discussion.)

Guided by the above idea, we forgo feature engineering and tackle online hate speech classification using a state-of-the-art GNN algorithm. The GNN model jointly exploits user features and relationships between all users in the dataset, including those users that are not annotated. We expect that the GNN model using this additional information will outperform the baseline LR model.

The article Knowing Your Neighbours: Machine Learning on Graphs provides an introductory yet comprehensive overview of graph machine learning.

The particular GNN algorithm we employ here was published in [3]. It is called Graph Sample and Aggregate (GraphSAGE) and builds on the insight that a prediction for a node should be based on the node’s feature vector but also those of its neighbours, perhaps their neighbours as well, and so on. Using the example of classifying hateful users, our working assumption is that a hateful user is likely to be connected with other hateful users. The strength of this connection will depend on both the graph distance between the two users (the graph distance between the two user nodes) and their feature vectors.

GraphSAGE introduces a new type of graph convolutional neural network layer that propagates information from a node’s neighbourhood while training a classifier. This new layer is summarised in Figure 4. As described in [3] such a layer “generates embeddings by sampling and aggregating features from a node’s local neighbourhood.” An embedding is a latent representation of a node that can be used as input to a classification model, typically a fully connected neural network, such that we can train all the model parameters in an end-to-end fashion. We can stack several such layers in sequence to construct a deeper network that fuses information from larger network neighbourhoods. The number of GraphSAGE layers to use is problem specific and should be tuned appropriately as a model hyperparameter.

Figure 4: Description of a GraphSAGE neural network layer. It uses aggregate information to form a node’s neighbourhood to learn how to make better predictions. The blue arrows indicate the node’s neighbours considered during the aggregation step.

Generally, graph neural network models can become computationally unwieldy for large graphs with high degree nodes. To avoid this, GraphSAGE employs a sampling scheme to limit the number of neighbours whose feature information is passed to the central node, as shown in the “AGGREGATE” step in Figure 4. Furthermore, GraphSAGE models learn functions that can be used to generate latent representations for nodes that were not present in the network during training. In consequence, GraphSAGE can suitably be used to make predictions in an inductive setting when only part of the graph is available at training time (while this is not the case for our working example, you can find such a demonstration in the Jupyter Notebook here.)

The open source StellarGraph Python Library provides an easy to use implementation of the GraphSAGE algorithm. In this article, we will use StellarGraph to build and train a GraphSAGE model for predicting hateful Twitter users. See the Jupyter notebook here for how to do this.

We can visualise the node latent representations for annotated users. We take the output activations of the first GraphSAGE layer as the node representations. These are shown in 2-D in Figure 5 where hateful users are shown in red and normal users in blue.

Figure 5: Visualisation of the node embeddings for annotated users. Hateful users are shown in red and normal users are shown in blue.

The node latent representations shown in Figure 6 indicate that the majority of hateful users tend to cluster together. However, some normal users are also in the same neighbourhood, and these will be difficult to distinguish from hateful ones. Similarly, there are a small number of hateful users dispersed among normal users, and these will also be difficult to classify correctly.

The GraphSAGE user classification model achieves an accuracy of 88.9% on the test data and an AU-ROC score of 0.88.

Comparison between models

Let’s now compare the GraphSAGE and logistic regression models, to see whether using the additional information about unlabelled users and relationships between users actually helped to make a better user classifier.

The ROC curves for both models are drawn together in Figure 6. The AU-ROC is 0.81 and 0.88 for the LR and GraphSAGE models respectively (larger numbers denote better performance). By this measure, we see that utilising relationship information in the machine learning model improves overall predictive performance.

Figure 6: Plot of the ROC curves for logistic regression (orange) and GraphSAGE (blue) models. Also shown are the areas under the curves. The curves are drawn using data in the test set.

When classifying users as hateful (positive class) or normal (negative class) it is important to minimise the number of false positives or the number of normal users that are incorrectly classified as hateful. At the same time, we want to correctly classify as many hateful users as possible. We can achieve both these goals by setting decision thresholds guided by the ROC curve.

Assuming we are willing to tolerate a false positive rate of approximately 2%, the two models achieve true positive rates of 0.378 and 0.253 for GraphSAGE and LR respectively. We thus see that for a fixed false positive rate of 2%, the GraphSAGE model achieves a true positive rate that is 12% higher than the LR model. That is, we can correctly identify more hateful users for the same low number of misclassified normal users. We can conclude that by using the relationship information available in the data, as well as the unlabelled user information, the performance of a machine learning model on sparsely labelled datasets with underlying network structure is greatly improved.


In this article, we considered the rise of online hate speech fuelled by the Internet’s growth and asked the question; “Can graph machine learning identify hate speech in online social networks?”

Our technical analysis answers this question with a resounding; “Yes, but there is still plenty of room for improvement.” We demonstrated that modern GNNs could help identify online hate speech at a much higher accuracy than traditional machine learning methods.

The Council on Foreign Relations recently published this article stating; “Violence attributed to online hate speech has increased worldwide.” We have shown that graph machine learning is a suitably powerful weapon in the fight against online hate speech. Our results provide encouragement for additional research for online hate speech classification with larger network datasets and more complex GNN methods.

If you want to learn more about how to use state-of-the-art GNN models for predictive modelling, have a look at the StellarGraph graph machine learning library and the numerous accompanying demos.

This work is supported by CSIRO’s Data61, Australia’s leading digital research network.


  1. Like Sheep Among Wolves”: Characterizing Hateful Users on Twitter. M. H. Ribeiro, P. H. Calais, Y. A. Santos, V. A. F. Almeida, and W. Meira Jr. 2018.
  2. Empath: Understanding Topic Signals in Large-Scale Text. E. Fast, B. Chen, M. S. Bernstein, Proceedings of the CHI Conference on Human Factors in Computing Systems, 2016.
  3. Inductive Representation Learning on Large Graphs. W. L. Hamilton, R. Ying, and J. Leskovec, NeurIPS, 2017


Original. Reposted with permission.


Bios: Pantelis Elinas is a Senior Research Engineer working at CSIRO’s Data61, Australia’s leading digital research network. He enjoys working on interesting problems, sharing knowledge, and developing useful software tools.

Anna Leontjeva is a Senior Data Scientist with more than 10+ years of experience currently working in CSIRO's Data61 on StellarGraph, the machine learning library for graphs.

Yuriy Tyshetskiy is a Senior Research Engineer leading the Graph Machine Learning Systems team at CSIRO's Data61, developing the StellarGraph library. His experience ranges across disciplines such as theoretical plasma physics, computer vision, and graph machine learning.