When do membership inference attacks succeed? An empirical study of overfitting in fine-tuned Large Language Models (LLMs)

Dié, Jean, Pierre, Dao-Koin; Dié, Jean, Pierre, Dao-Koin

When do membership inference attacks succeed? An empirical study of overfitting in fine-tuned Large Language Models (LLMs)

To access the full text documents, please follow this link: https://hdl.handle.net/2117/460646

Author

Dié, Jean, Pierre, Dao-Koin

Other authors

Universitat Politècnica de Catalunya. Universitat Rovira i Virgili

Universitat Rovira i Virgili

Universitat de Barcelona

Moreno Ribas, Antonio

David Sánchez Ruenes, Josep Domingo I Ferrer

Publication date

2026-01-27

Abstract

Large language models fine-tuned on domain-specific data are vulnerable to membership inference attacks, which can reveal whether particular examples were used in training. While prior work has established that fine-tuned models exhibit higher vulnerability than pre-trained models, this research has focused almost exclusively on endpoint comparisons-evaluating vulnerability after fine-tuning is complete without examining how it develops during training. This thesis investigates the progressive emergence of membership inference vulnerability across training epochs and its relationship with overfitting. We evaluate five membership inference attacks across five fine-tuning methods (full fine-tuning, LoRA, BitFit, adapter tuning, and prefix tuning), three model scales (1B, 6.9B, and 12B parameters), and five training epochs, yielding 375 attack evaluations. To ensure methodological rigor, we employ bag-of-words validation to verify that evaluation datasets are free from distribution artifacts that have confounded prior benchmarks. The central finding is a strong correlation between the training-validation loss gap-a standard measure of overfitting-and attack effectiveness across all experimental conditions. Pearson correlations range from 0.838 to 0.996 across attack methods, with all correlations statistically significant (p < 0.001). This relationship holds consistently across fine-tuning methods and model scales, suggesting that membership inference attacks primarily succeed when models are overfitted rather than exploiting fundamental architectural vulnerabilities. Reference-based attacks, which compare the fine-tuned model's behavior against the original base model, show amplified sensitivity compared to attacks that examine only the fine-tuned model, achieving high effectiveness at lower overfitting levels. These findings suggest that standard generalization practices may reduce membership inference vulnerability alongside their benefits for model quality. The loss gap, already monitored by practitioners for model selection, could serve as a practical privacy risk indicator during fine-tuning without requiring attack implementation. The core contributions of this thesis have been accepted for publication at RECSI 2026 (XVIII Reunión Española sobre Criptología y Seguridad de la Información).

Document Type

Master thesis

Language

English

Subjects and keywords

Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic; Àrees temàtiques de la UPC::Informàtica::Seguretat informàtica; Machine learning; Computer security; Atacs d'inferència de pertinença; Models de llenguatge grans; Ajust fi; Privacitat; Ajust fi eficient en paràmetres; Sobreajust; Membership inference attacks; Large language models; Fine-tuning; Privacy; Parameter-efficient fine-tuning; Overfitting; Aprenentatge automàtic; Seguretat informàtica

Publisher

Universitat Politècnica de Catalunya

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

Open Access

This item appears in the following Collection(s)

Treballs acadèmics [82686]

When do membership inference attacks succeed? An empirical study of overfitting in fine-tuned Large Language Models (LLMs)

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Recommended citation

Export

Rights

This item appears in the following Collection(s)