Large-scale web tracking and cookie compliance: Evaluating one million websites under GDPR with AI categorization

Author

Martínez Álvarez, David

Molero Grau, Aniol

Calle Ortega, Eusebi

Canals Ametller, Dolors

Jové, Albert

Publication date

2025-10



Abstract

With the increasing prevalence of web-tracking technologies, including tracking cookies, pixel tracking, and browser fingerprinting techniques, there is a pressing need to analyze their impact on user privacy. Despite the growing interest in the scholarly literature, large-scale, fully automatic evaluations of website compliance with privacy regulations remain scarce. In this paper, we present new algorithms, methods, and an AI categorization model designed for massive, fully automatic analyses of web-tracking and cookie compliance and usage with and without valid user consent. Utilizing the recently published Website Evidence Collector (WEC) software from the European Data Protection Supervisor (EDPS), these algorithms are applied to assess over one million websites from Tranco's top list under European GDPR regulation. A novel 22-category multilabel AI model for website categorization provides content-based context to compliance results, achieving 96.56% accuracy and an F1 score of 0.963. Results reveal that nearly half of the websites utilize tracking cookies, while over half employ pixel tracking without user consent, thus highlighting significant differences between websites' content categories. Additionally, our analysis demonstrates how web-tracking power is concentrated among just a few companies, with the top 10 tracking firms being responsible for most compliance violations related to obtaining valid user consent. This paper serves as a foundation for ongoing large-scale web-tracking analyses, essential for understanding trends over time and evaluating the effectiveness of privacy regulations


The University of Girona Institute of Informatics and Applications researchers thank the Generalitat de Catalunya for their support through a Consolidated Research Group (2021 SGR 01125). David Martínez thanks the University of Girona for his FI fellowship (IFUdG 46 2022)


Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier

Document Type

Article
Published version
peer-reviewed

Language

English

Subjects and keywords

Protecció de dades; Intel·ligència artificial; Data protection; Artificial intelligence; Internet -- Mesures de seguretat; Internet -- Security measures

Publisher

Elsevier

Related items

info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jnca.2025.104222

info:eu-repo/semantics/altIdentifier/issn/1084-8045

Rights

Reconeixement 4.0 Internacional

http://creativecommons.org/licenses/by/4.0

This item appears in the following Collection(s)