dc.contributor
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.contributor
Sallés Rius, Anna
dc.contributor.author
Espasa Rosell, Jordi
dc.date.accessioned
2025-11-08T07:19:10Z
dc.date.available
2025-11-08T07:19:10Z
dc.date.issued
2025-10-20
dc.identifier
https://hdl.handle.net/2117/445762
dc.identifier.uri
https://hdl.handle.net/2117/445762
dc.description.abstract
This Master's Thesis optimizes large language models (LLMs) for multiple-choice question answering (MCQA) to evaluate employee performance from spoken transcripts in personalized training platforms. Current LLMs achieve only 63% accuracy in dynamic assessments due to biases, reasoning failures, and inefficiencies. We develop a systematic framework balancing precision, cost, and execution time through iterative evaluation refinement, corpus preparation, baseline selection, and phased experiments, including single-factor screening (OFAT), multi-factor interactions, and parameter-efficient fine-tuning (PEFT). Key factors assessed include model scale, in-context learning, chain-of-thought (CoT), chain-of-density (CoD), self-correction, and agentic ensembles. Contributions encompass a replicable optimization pipeline and strategies to mitigate biases like positional and literal interpretation errors. Results show improvements from 63% to 80% accuracy and enhanced F1-scores, enabling ethical, scalable AI-driven assessments for enterprise individualized learning.
dc.format
application/pdf
dc.publisher
Universitat Politècnica de Catalunya
dc.subject
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject
Deep learning (Machine learning)
dc.subject
Questions and answers
dc.subject
Models de llenguatge de gran escala
dc.subject
Resposta a preguntes d'opció múltiple
dc.subject
Avaluació del rendiment d'empleats
dc.subject
Transcripcions orals
dc.subject
Plataformes de formació personalitzada
dc.subject
Optimització de models
dc.subject
Precisió en avaluacions dinàmiques
dc.subject
Biaixos en models d'IA
dc.subject
Fallades de raonament
dc.subject
Marc sistemàtic
dc.subject
Refinament iteratiu d'avaluació
dc.subject
Preparació de corpus
dc.subject
Selecció de línia base
dc.subject
Experiments per fases
dc.subject
Cribratge d'un sol factor
dc.subject
Large language models
dc.subject
Multiple-choice question answering
dc.subject
Employee performance evaluation
dc.subject
Spoken transcripts
dc.subject
Mersonalized training platforms
dc.subject
Model optimization
dc.subject
Accuracy in dynamic assessments
dc.subject
AI model biases
dc.subject
Reasoning failures
dc.subject
Systematic framework
dc.subject
Iterative evaluation refinement
dc.subject
Corpus preparation
dc.subject
Baseline selection
dc.subject
Phased experiments
dc.subject
One-factor-at-a-time screening
dc.subject
Multi-factor interactions
dc.subject
Parameter-efficient fine-tuning
dc.subject
In-context learning
dc.subject
Chain-of-thought
dc.subject
Aprenentatge profund (Aprenentatge automàtic)
dc.subject
Preguntes i respostes
dc.title
Natural language models for learning assessment from unstructured data