Background: The COVID-19 pandemic has affected people’s daily lives and has caused economic loss worldwide. Anecdotal evidence suggests that the pandemic has increased depression levels among the population. However, systematic studies of depression detection and monitoring during the pandemic are lacking.
Objective: This study aims to develop a method to create a large-scale depression user data set in an automatic fashion so that the method is scalable and can be adapted to future events; verify the effectiveness of transformer-based deep learning language models in identifying depression users from their everyday language; examine psychological text features’ importance when used in depression classification; and, finally, use the model for monitoring the fluctuation of depression levels of different groups as the disease propagates.
Methods: To study this subject, we designed an effective regular expression-based search method and created the largest English Twitter depression data set containing 2575 distinct identified users with depression and their past tweets. To examine the effect of depression on people’s Twitter language, we trained three transformer-based depression classification models on the data set, evaluated their performance with progressively increased training sizes, and compared the model’s tweet chunk-level and user-level performances. Furthermore, inspired by psychological studies, we created a fusion classifier that combines deep learning model scores with psychological text features and users’ demographic information, and investigated these features’ relations to depression signals. Finally, we demonstrated our model’s capability of monitoring both group-level and population-level depression trends by presenting two of its applications during the COVID-19 pandemic.