Evaluation of Interpretability Methods in Extractive NLP

Other authors

Universitat Politècnica de Catalunya. Departament de Ciències de la Computació

Escolano Peinado, Carlos

Ferrando Monsonís, Javier

Publication date

2023-06-30

Abstract

In the context of advanced Natural Language Processing (NLP), this study delves into the interpretability domain, aiming to understand the logic behind models' decision making process. It focuses on three NLP tasks - Question Answering (QA), Text Summarization (TS), and Error Detection (ED). BERT and DistilBERT models were fine-tuned to address each task. The research introduces tailor-made datasets, placing special emphasis on the ED task for misspelling to enhance transparency. Additionally, novel faithfulness metrics are introduced to assess interpretability methods. A variety of gradient-based methods and an attention-based method, Aggregation of Layer-wise Token-to-token Interactions (ALTI), were evaluated. ALTI outperformed the gradient-based methods in all tasks albeit its higher computational cost. Integrated Gradients (IG) showed the highest variability, performing well in shorter ED sequences but less so in longer QA and TS sequences. The work stresses the need to broaden the task spectrum in interpretability evaluations and the significance of ensuring the robustness of emerging metrics.

Document Type

Master thesis

Language

English

Publisher

Universitat Politècnica de Catalunya

Recommended citation

This citation was generated automatically.

Rights

Open Access

This item appears in the following Collection(s)