Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/10459.1/60252

CatDetect, a framework for detecting Catalan tweets
Plaza Cagigós, Sergi
Solsona Tehàs, Francesc; Vilaplana Mayoral, Jordi; Universitat de Lleida. Escola Politècnica Superior
This work deals with language detection. It includes new proposals ranging from lexicon and morphological analysis to an increasing use of machine learning solutions. In this case, the language study is focused on Catalan, a minority language. Difficulty even increases in detecting Catalan on tweets, messages written in the Twitter social network. To achieve that, a Twitter-Catalan corpus has been generated using lexicon and morphological approaches, which then will be used to create supervised models based on Machine Learning techniques. They are also evaluated in order to see which one obtains the best prediction score and thus, the best suitability to be used. The best model is to be used in a website, where users can test the algorithm interactively in a front-end webpage and in background by means of a webservice across a RESTful API.
-Catalan
-Language Detection
-Twitter corpus
-Machine Learning
-Website
-Twitter
-Català -- Ús
cc-by-nc-nd
http://creativecommons.org/licenses/by-nc-nd/4.0/
bachelorThesis
         

Documentos con el texto completo de este documento

Ficheros Tamaño Formato Vista
splazac.pdf 311.8 KB application/pdf Vista/Abrir

Mostrar el registro completo del ítem