To access the full text documents, please follow this link: http://hdl.handle.net/2117/80964

TweetNorm: a benchmark for lexical normalization of spanish tweets
Alegria, Iñaki; Aranberri, Nora; Comas Umbert, Pere Ramon; Fresno, Víctor; Gamallo, Pablo; Padró, Lluís; San Vicente Roncal, Iñaki; Turmo Borras, Jorge; Zubiaga, Arkaitz
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació; Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
The language used in social media is often characterized by the abundance of informal and non-standard writing. The normalization of this non-standard language can be crucial to facilitate the subsequent textual processing and to consequently help boost the performance of natural language processing tools applied to social media text. In this paper we present a benchmark for lexical normalization of social media posts, specifically for tweets in Spanish language. We describe the tweet normalization challenge we organized recently, analyze the performance achieved by the different systems submitted to the challenge, and delve into the characteristics of systems to identify the features that were useful. The organization of this challenge has led to the production of a benchmark for lexical normalization of social media, including an evaluation framework, as well as an annotated corpus of Spanish tweets-TweetNorm_es-, which we make publicly available. The creation of this benchmark and the evaluation has brought to light the types of words that submitted systems did best with, and posits the main shortcomings to be addressed in future work.
-Standard language
-Social media
-Twitter
-Lexical normalization
-Twitter
-Social media
-Corpus
-Evaluation
-Lexicografia
-Normalització lingüística
-Mitjans de comunicació social
-Twitter
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Article - Published version
Article
         

Show full item record

Related documents

Other documents of the same author

Alegria, Iñaki; Aranberri, Nora; Comas Umbert, Pere Ramon; Fresno, Víctor; Gamallo, Pablo; Padró, Lluís; San Vicente Roncal, Iñaki; Turmo Borras, Jorge; Zubiaga, Arkaitz
Padró, Lluís; Turmo Borras, Jorge; Alegria, Iñaki; Aranberri, Nora; Fresno, Víctor; Samallo, Pablo; San Vicente, Iñaki; Zubiaga, Arkaitz
Alegria, Iñaki; Aranberri, Nora; España Bonet, Cristina; Gamallo, Pablo; Gonçalo Oliveira, Hugo; Martínez Garcia, Eva; San Vicente Roncal, Iñaki; Toral, Antonio; Zubiaga, Arkaitz
Ageno Pulido, Alicia; Comas Umbert, Pere Ramon; Padró, Lluís; Turmo Borras, Jorge
Comas Umbert, Pere Ramon; Turmo Borras, Jorge; Màrquez Villodre, Lluís
 

Coordination

 

Supporters