Incremental prediction model from sparse medical data

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Tesines i projectes i treballs de final de carrera > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2117/134922

Title:	Incremental prediction model from sparse medical data
Author:	Gartzia Ibabe, Mireia
Other authors:	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Forcada Bigas, Roger; Rodríguez Fonollosa, José Adrián
Abstract:	Given millions of medical triage sessions, with thousands of posible features, the objective is to predict what the user is going to do next for at least one of the triage steps
Abstract:	Mediktor is a clinically validated symptom-checker that gives accurate pre-diagnosis based on a personalized interactive questionnaire that the users have to answer. Learning to predict the next most fitting question to ask to a user, given some previously answered ones, is key for adding value to Mediktor's evaluator system. Current solutions for similar tasks, i.e. predicting a word given a specific context, are based on representing them in vector spaces. These vectors are called word embeddings* and these methods are proved to give outstanding results in Natural Language Processing tasks. For this particular project, the Continuous Bag of Words (CBOW) model from the Word2Vec models by Mikolov, T. et al (1) was the most suitable approach. It is demonstrated (2) that these models are able to learn excellent vector representations of words; the challenge is to make it work for datasets that have great variability and complexity as in our case. This work has been valuable for understanding the similarity between groups of questions that are asked together with every other question in the vocabulary. Being the first approach of using machine learning techniques to learn similarities among questions for this particular data, the results are satisfactory. The predictions obtained for the testing data and the visualization of the word embeddings in the multidimensional space are reasonable. In the process of graphical validation of the model, we have found that as expected, the learned word embeddings have formed clusters (groups of similar questions) in the vector space. Nevertheless, further research should be done into this subject to optimize the training aiming for better results in prediction. Word embeddings are representations of words in a lower dimensional vector space. They allow to learn features about the words and how they interact with each other; words with similar meanings should have similar representations. Being the first approach of using Machine Learning techniques to learn similarities among questions for this particular data, the results are coherent. The predictions obtained for the testing data and the visualization of the word embeddings in the multidimensional space are reasonable. In the process of graphical validation of the model, we have found that as expected, the learned word embeddings have formed clusters (groups of similar questions) in the vector space. Nevertheless, further research should be done into this subject to optimize the training aiming for better results. Word embeddings are representations of words in a lower dimensional vector space. They allow to learn features about the words and how they interact with each other; words with similar meanings should have similar representations.
Subject(s):	-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació -Machine learning -Neural networks (Computer science) -Natural language processing (Computer science) -Aprenentatge automàtic -Xarxes neuronals (Informàtica) -Tractament del llenguatge natural (Informàtica)
Rights:	S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada' http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Document type:	Research/Master Thesis
Published by:	Universitat Politècnica de Catalunya
Share:

Show full item record

Accesibility | Legal note | Cookies Policy

Coordination

Supporters