Prompting in Deep Learning Speech Recognition

Sánchez Shiromizu, Lucas Takanori; Sánchez Shiromizu, Lucas Takanori

Prompting in Deep Learning Speech Recognition

Autor/a

Sánchez Shiromizu, Lucas Takanori

Otros/as autores/as

Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions

Hernando Pericás, Francisco Javier

Fecha de publicación

2024-10-28

Resumen

Adapting large-scale pre-trained automatic speech recognition (ASR) models, such as OpenAI?s Whisper, to specific tasks or languages remains a challenging problem due to the substantial computational resources required for traditional fine-tuning methods. This limitation is particularly significant in real-world scenarios where resources are constrained, and efficient adaptation is essential for handling diverse languages and domains. To address this issue, this thesis explores two parameter-efficient fine-tuning (PEFT) techniques: soft prompting and Low-Rank Adaptation (LoRA). Soft prompting leverages trainable prompt embeddings to adapt the model with minimal parameter updates, while LoRA applies low-rank transformations to the model?s weight matrices, reducing the number of trainable parameters. Through experiments on the 3CatParla dataset for Catalan speech recognition, we demonstrate that these techniques achieve competitive performance with significantly lower computational demands. LoRA, in particular, shows strong results in terms of efficiency and accuracy, while soft prompting exhibits performance limitations with larger models. This work opens pathways for further research into hybrid methods and evaluation across more diverse datasets, contributing to the field of efficient ASR adaptation for low-resource environments.

Tipo de documento

Master thesis

Lengua

Inglés

Materias y palabras clave

Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic; Automatic speech recognition; Machine learning; Natural language processing (Computer science); ASR; DEEP LEARNING; PROMPTING; SPEECH; Reconeixement automàtic de la parla; Aprenentatge automàtic; Tractament del llenguatge natural (Informàtica)

Publicado por

Universitat Politècnica de Catalunya

Citación recomendada

Esta citación se ha generado automáticamente.

Exportar

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Derechos

S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'

Open Access

Este ítem aparece en la(s) siguiente(s) colección(ones)

Treballs acadèmics [82542]

Prompting in Deep Learning Speech Recognition

Autor/a

Otros/as autores/as

Fecha de publicación

Compartir

Resumen

Tipo de documento

Lengua

Materias y palabras clave

Publicado por

Citación recomendada

Exportar

Derechos

Este ítem aparece en la(s) siguiente(s) colección(ones)