Classifying Concerning Tweets
This study aimed to determine whether suicide-related Twitter posts could be classified as ‘strongly concerning’ based solely on the content of the post, as judged by human coders.
We wished to determine whether such content could be legitimately considered as an indicator of suicide risk. From 18 February-23 April 2014, Twitter was monitored for a series of suicide-related phrases and terms using a public API. Matching tweets were stored in a data annotation tool developed by the Commonwealth Scientific and Industrial Research Organisation (CSIRO).
During this time 14,701 suicide-related tweets were collected; 14% were randomly selected and divided into two equal sets for coding by human researchers. Machine learning processes were then applied to assess whether we could identify a 'concerning tweet', automatically and in realtime.
The machine-learned classifier correctly identified 80% of 'strongly concerning' tweets and showed increasing gains in accuracy; however, future improvements are necessary as a plateau was not reached.
The study demonstrated that it is possible to distinguish the level of concern among suicide-related tweets, using both human coders and an automatic machine classifier. Importantly, the machine classifier replicated the accuracy of the human coders.
These findings confirm that Twitter is used by individuals to express suicidality and that the proposed method has advanced our ability to automatically and reliably detect suicidality among Twitter users.