Large-scale web tracking and cookie compliance: Evaluating one million websites under GDPR with AI categorization

dc.contributor.author
Martínez Álvarez, David
dc.contributor.author
Molero Grau, Aniol
dc.contributor.author
Calle Ortega, Eusebi
dc.contributor.author
Canals Ametller, Dolors
dc.contributor.author
Jové, Albert
dc.date.accessioned
2025-06-13T04:05:34Z
dc.date.available
2025-06-13T04:05:34Z
dc.date.issued
2025-10
dc.identifier
http://hdl.handle.net/10256/26902
dc.identifier.uri
http://hdl.handle.net/10256/26902
dc.description.abstract
With the increasing prevalence of web-tracking technologies, including tracking cookies, pixel tracking, and browser fingerprinting techniques, there is a pressing need to analyze their impact on user privacy. Despite the growing interest in the scholarly literature, large-scale, fully automatic evaluations of website compliance with privacy regulations remain scarce. In this paper, we present new algorithms, methods, and an AI categorization model designed for massive, fully automatic analyses of web-tracking and cookie compliance and usage with and without valid user consent. Utilizing the recently published Website Evidence Collector (WEC) software from the European Data Protection Supervisor (EDPS), these algorithms are applied to assess over one million websites from Tranco's top list under European GDPR regulation. A novel 22-category multilabel AI model for website categorization provides content-based context to compliance results, achieving 96.56% accuracy and an F1 score of 0.963. Results reveal that nearly half of the websites utilize tracking cookies, while over half employ pixel tracking without user consent, thus highlighting significant differences between websites' content categories. Additionally, our analysis demonstrates how web-tracking power is concentrated among just a few companies, with the top 10 tracking firms being responsible for most compliance violations related to obtaining valid user consent. This paper serves as a foundation for ongoing large-scale web-tracking analyses, essential for understanding trends over time and evaluating the effectiveness of privacy regulations
dc.description.abstract
The University of Girona Institute of Informatics and Applications researchers thank the Generalitat de Catalunya for their support through a Consolidated Research Group (2021 SGR 01125). David Martínez thanks the University of Girona for his FI fellowship (IFUdG 46 2022)
dc.description.abstract
Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier
dc.format
application/pdf
dc.language
eng
dc.publisher
Elsevier
dc.relation
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jnca.2025.104222
dc.relation
info:eu-repo/semantics/altIdentifier/issn/1084-8045
dc.rights
Reconeixement 4.0 Internacional
dc.rights
http://creativecommons.org/licenses/by/4.0
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Journal of Network and Computer Applications, 2025, vol. 242, núm. art.núm.104222
dc.source
Articles publicats (D-ATC)
dc.source
Martínez Álvarez, David Molero Grau, Aniol Calle Ortega, Eusebi Canals Ametller, Dolors Jové, Albert 2025 Large-scale web tracking and cookie compliance: Evaluating one million websites under GDPR with AI categorization Journal of Network and Computer Applications 242 art.núm.104222
dc.subject
Protecció de dades
dc.subject
Intel·ligència artificial
dc.subject
Data protection
dc.subject
Artificial intelligence
dc.subject
Internet -- Mesures de seguretat
dc.subject
Internet -- Security measures
dc.title
Large-scale web tracking and cookie compliance: Evaluating one million websites under GDPR with AI categorization
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion
dc.type
peer-reviewed


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)