Title:
|
Terminology-aware segmentation and domain feature for the WMT19 biomedical translation task
|
Author:
|
Carrino, Casimiro Pio; Rafieian, Bardia; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
|
Other authors:
|
Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
Abstract:
|
In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively. |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic -Àrees temàtiques de la UPC::Enginyeria biomèdica -Machine translating -Biomedical engineering -Traducció automàtica -Enginyeria biomèdica |
Rights:
|
|
Document type:
|
Article - Published version Conference Object |
Published by:
|
Association for Computational Linguistics
|
Share:
|
|