Large-scale web tracking and cookie compliance: Evaluating one million websites under GDPR with AI categorization

Autor/a

Martínez Álvarez, David

Molero Grau, Aniol

Calle Ortega, Eusebi

Canals Ametller, Dolors

Jové, Albert

Data de publicació

2025-10



Resum

With the increasing prevalence of web-tracking technologies, including tracking cookies, pixel tracking, and browser fingerprinting techniques, there is a pressing need to analyze their impact on user privacy. Despite the growing interest in the scholarly literature, large-scale, fully automatic evaluations of website compliance with privacy regulations remain scarce. In this paper, we present new algorithms, methods, and an AI categorization model designed for massive, fully automatic analyses of web-tracking and cookie compliance and usage with and without valid user consent. Utilizing the recently published Website Evidence Collector (WEC) software from the European Data Protection Supervisor (EDPS), these algorithms are applied to assess over one million websites from Tranco's top list under European GDPR regulation. A novel 22-category multilabel AI model for website categorization provides content-based context to compliance results, achieving 96.56% accuracy and an F1 score of 0.963. Results reveal that nearly half of the websites utilize tracking cookies, while over half employ pixel tracking without user consent, thus highlighting significant differences between websites' content categories. Additionally, our analysis demonstrates how web-tracking power is concentrated among just a few companies, with the top 10 tracking firms being responsible for most compliance violations related to obtaining valid user consent. This paper serves as a foundation for ongoing large-scale web-tracking analyses, essential for understanding trends over time and evaluating the effectiveness of privacy regulations


The University of Girona Institute of Informatics and Applications researchers thank the Generalitat de Catalunya for their support through a Consolidated Research Group (2021 SGR 01125). David Martínez thanks the University of Girona for his FI fellowship (IFUdG 46 2022)


Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier

Tipus de document

Article
Versió publicada
peer-reviewed

Llengua

Anglès

Matèries i paraules clau

Protecció de dades; Intel·ligència artificial; Data protection; Artificial intelligence; Internet -- Mesures de seguretat; Internet -- Security measures

Publicat per

Elsevier

Documents relacionats

info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jnca.2025.104222

info:eu-repo/semantics/altIdentifier/issn/1084-8045

Drets

Reconeixement 4.0 Internacional

http://creativecommons.org/licenses/by/4.0

Aquest element apareix en la col·lecció o col·leccions següent(s)