dc.contributor
Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
dc.contributor
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.contributor
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor
Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
dc.contributor.author
Carrino, Casimiro Pio
dc.contributor.author
Escolano Peinado, Carlos
dc.contributor.author
Gasco Sánchez, Luis
dc.contributor.author
Rodríguez Fonollosa, José Adrián
dc.identifier
Carrino, C. [et al.]. Promoting generalized cross-lingual question answering in few-resource scenarios via self-knowledge distillation. «Procesamiento del lenguaje natural (SEPLN)», Setembre 2025, núm. 75, p. 65-82.
dc.identifier
https://hdl.handle.net/2117/449061
dc.identifier
10.26342/2025-75-5
dc.description.abstract
We address the challenge of Generalized Cross-Lingual Transfer (G-XLT) in extractive Question Answering, where question and context languages differ, a problem particularly difficult for low-resource languages. Working with only a thousand parallel QA samples, we combine cross-lingual sampling with self-knowledge distillation to regularize cross-lingual fine-tuning. We introduce the novel mean Average Precision at k (mAP@k) coefficient, which mitigates the negative impact of incorrect predictions during training and serves as a diagnostic tool providing early training guidance and reliable indicators of model learning. Evaluations on MLQA, XQuAD, and TyDiQA-GoldP datasets demonstrate that our approach consistently outperforms standard cross-entropy fine-tuning of the mBERT multilingual model. Our method represents a promising alternative to machine translation-based approaches, particularly valuable for low-resource languages where translation quality is poor, offering an efficient solution for cross-lingual transfer in data-scarce settings.
dc.description.abstract
This work was supported by the project PID2019-107579RB-I00 (MICINN).
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (published version)
dc.format
application/pdf
dc.relation
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6739
dc.relation
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107579RB-I00/ES/ARQUITECTURAS AVANZADAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ, AUDIO Y LENGUAJE/
dc.subject
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject
Extractive question answering
dc.subject
Cross-lingual transfer
dc.subject
Knowledge distillation
dc.title
Promoting generalized cross-lingual question answering in few-resource scenarios via self-knowledge distillation
dc.title
Fomentando la respuesta a preguntas cross-lingüística generalizada en escenarios con pocos recursos mediante auto-destilación del conocimiento