Enhanced word embedding variations for the detection of substance abuse and mental health issues on social media writings

Ramírez Cifuentes, Diana; Largeron, Christine; Tissier, Julien; Baeza Yates, Ricardo; Freire, Ana

Enhanced word embedding variations for the detection of substance abuse and mental health issues on social media writings

dc.contributor.author

Ramírez Cifuentes, Diana

dc.contributor.author

Largeron, Christine

dc.contributor.author

Tissier, Julien

dc.contributor.author

Baeza Yates, Ricardo

dc.contributor.author

Freire, Ana

dc.date.issued

2021-11-16T08:41:51Z

dc.date.issued

2021-11-16T08:41:51Z

dc.date.issued

2021

dc.identifier

Ramírez-Cifuentes D, Largeron C, Tissier J, Baeza-Yates R, Freire A. Enhanced word embedding variations for the detection of substance abuse and mental health issues on social media writings. IEEE Access. 2021;9:130449-71. DOI: 10.1109/ACCESS.2021.3112102

dc.identifier

2169-3536

dc.identifier

http://hdl.handle.net/10230/48984

dc.identifier

http://dx.doi.org/10.1109/ACCESS.2021.3112102

dc.description.abstract

Substance abuse and mental health issues are severe conditions that affect millions. Signs of certain conditions have been traced on social media through the analysis of posts. In this paper we analyze textual cues that characterize and differentiate Reddit posts related to depression, eating disorders, suicidal ideation, and alcoholism, along with control posts. We also generate enhanced word embeddings for binary and multi-class classification tasks dedicated to the detection of these types of posts. Our enhancement method to generate word embeddings focuses on identifying terms that are predictive for a class and aims to move their vector representations close to each other while moving them away from the vectors of terms that are predictive for other classes. Variations of the embeddings are defined and evaluated through predictive tasks, a cosine similarity-based method, and a visual approach. We generate predictive models using variations of our enhanced representations with statistical and deep learning approaches. We also propose a method that leverages the properties of the enhanced embeddings in order to build features for predictive models. Results show that variations of our enhanced representations outperform in Recall, Accuracy, and F1-Score the embeddings learned with Word2vec , DistilBERT , GloVe ’s fine-tuned pre-learned embeddings and other methods based on domain adapted embeddings. The approach presented has the potential to be used on similar binary or multi-class classification tasks that deal with small domain-specific textual corpora.

dc.description.abstract

This work was supported by the University of Lyon IDEXLYON, the Auvergne-Rhône-Alpes Region, and the Spanish Ministry of Economy and Competitiveness through the Maria de Maeztu Units of Excellence Program under Grant MDM-2015-0502.

dc.format

application/pdf

dc.format

application/pdf

dc.language

eng

dc.publisher

Institute of Electrical and Electronics Engineers (IEEE)

dc.relation

IEEE Access. 2021;9.

dc.rights

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

dc.rights

https://creativecommons.org/licenses/by/4.0/

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Classification algorithms

dc.subject

Data mining

dc.subject

Mental disorders

dc.subject

Natural language processing

dc.subject

Supervised learning

dc.title

Enhanced word embedding variations for the detection of substance abuse and mental health issues on social media writings

dc.type

info:eu-repo/semantics/article

dc.type

info:eu-repo/semantics/publishedVersion

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Recerca: articles, congressos, llibres [21065]