Machine Learning for mental illness detection from Facebook posts

3 min readJan 11, 2021

Today I’ve read an article about machine learning for mental illness detection from Facebook posts. I found this article thanks to the deeplearning.ai “The Batch” newsletter. The paper’s source can be found at the bottom of this article.

Mental illnesses are a surprisingly present issue in our modern societies. Despite the lack of direct conflicts with other nations, despite the increase in life expectancy over the last decades, despite all the great discoveries and developments in science, people seem to get more and more lonely and more likely to develop mental illness.

Most indicators indicate that mental health is suffering in OECD countries. Suicide prevalence, poor mental health, depression: they all increase. At the same time, we are more connected than ever and social platforms such as Twitter, Facebook and such occupy a bigger and bigger share of our daily time. Whether these media help us to maintain healthy mental health or not, our usage of them is surely tainted by how we do feel at the instant. Do we tend to share more or less content when we are sad? Do we tend to use more negative words when we feel depressed? Or on the other side, do we share more joyful things when we are feeling bad? The authors of the article I’ve read today try to better understand the links between what we upload online and our current mental health using Machine Learning.

To do so, they gathered 223 participants splitter in 3 different groups according to their medical records: subjects with schizophrenia spectrum disorders (SSD), mood disorders (MD), and healthy volunteers. They then asked them to post content on Facebook. 18 months later, they collected more than 3.4 million messages and 142,000 images and tried to train a Machine Learning algorithm to predict the group of the subject based on the characteristics of the text and the image posted by the subject.

The first step the authors had to make was to turn all Facebook contents into a language and format which would be understandable by the Machine Learning algorithms: numbers and/or sequences of numbers. For text contents, they used LIWC (Linguistic Inquiry and Word Count) which is a software/algorithm which can evaluate how positive or negative is a text. You can all try this out here: https://liwc.wpengine.com/ but according to my tests, the algorithm is pretty basic and was giving me comparable results between “This is very good.” and “This is not very good.”. For image contents, the authors extracted some relevant features related to what colors were in the pixel. Thanks to these two methods, they were able to turn both texts and images into a sequence of numbers.

I personally think that more advanced methods could have been used like pre-trained BERT models to represent texts into vectors and CNNs derived models for image analysis which directly extract features from the actual content of the image instead of looking at some color distributions. Unfortunately, because of the sensitivity of the data for the participants, neither the code nor the dataset is publicly available.

After having transformed the raw data into features, the authors trained a Random Forest to predict the group of the subject based on its published content. Random Forest is a simple and broadly used algorithm associated with both high performance and pretty easy training procedures.

In terms of accuracy, classification scores when predicting the right group ranged from 52% to 57% with a baseline score of around 33% (which corresponds to the proportion of each group). This demonstrated that the algorithm was doing better than a random guess.

This work shows that contents uploaded on social media may be actually tainted by our mood when we post such contents and that this taint may be distinct enough for an algorithm to predict if we are subject to psychiatric illness. This kind of work may open the way to automatic content screening using Machine Learning algorithms to detect people with poor mental health which may help in preventing suffering depression.

Source: Birnbaum, M.L., Norel, R., Van Meter, A., Ali, A.F., Arenare, E., Eyigoz, E., Agurto, C., Germano, N., Kane, J.M. and Cecchi, G.A., 2020. Identifying signals associated with psychiatric illness utilizing language and images posted to Facebook. npj Schizophrenia, 6(1), pp.1–10.

Machine Learning for mental illness detection from Facebook posts

Written by Jean-Charles Nigretto