Clustering and topic modeling for biomedical text mining

Rognon, Paul Joris Denis; Rognon, Paul Joris Denis

Clustering and topic modeling for biomedical text mining

Author

Rognon, Paul Joris Denis

Other authors

Universitat de Barcelona. Departament de Genètica, Microbiologia i Estadística

Reverter Comes, Ferran

Vegas Lozano, Esteban

Publication date

2021-06

Abstract

In this work, we study the problem of characterizing an unlabelled corpus of biomedical documents in an unsupervised manner. After a review of the literature on the subject, we propose an integrative approach to the problem. The integration is twofold. On one hand, we integrate, with multiview learning, different text representations derived from a traditional bag-of-words model, Latent Dirichlet Allocation, and a recurrent neural autoencoder. On the other hand, we integrate topic modeling outputs, clustering outputs and biomedical word embeddings to generate an intuitive and comprehensive characterization of the corpus. We also propose a semantic graph that supplies a synthetic visualization of the relationships between topics, clusters, and any other biomedical concept, based on semantic similarity. An application to the CORD-19 dataset, a collection of articles on COVID-19, shows our methodology produces a coherent, meaningful, and informative characterization of the corpus.

Document Type

Master thesis

Language

English

Subjects and keywords

Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica; Statistical Mathematics -- Applications; Text mining; Document clustering; Topic modeling; Word embeddings; Biomedical text mining; Estadística matemàtica--Aplicacions; Classificació AMS::62 Statistics::62P Applications

Publisher

Universitat Politècnica de Catalunya

Universitat de Barcelona

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

Restricted access - author's decision

This item appears in the following Collection(s)

Treballs acadèmics [82539]

Clustering and topic modeling for biomedical text mining

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Recommended citation

Export

Rights

This item appears in the following Collection(s)