On the privacy–utility trade-off in differentially private hierarchical text classification

dc.contributor
Universitat Politècnica de Catalunya. Departament d'Enginyeria Telemàtica
dc.contributor
Universitat Politècnica de Catalunya. SISCOM - Smart Services for Information Systems and Communication Networks
dc.contributor.author
Wunderlich, Dominik
dc.contributor.author
Bernau, Daniel
dc.contributor.author
Aldà, Francesco
dc.contributor.author
Parra Arnau, Javier
dc.contributor.author
Strufe, Thorsten
dc.date.issued
2022-11-04
dc.identifier
Wunderlich, D. [et al.]. On the privacy–utility trade-off in differentially private hierarchical text classification. "Applied sciences (Basel)", 4 Novembre 2022, vol. 12, núm. 11177, p. 1-21.
dc.identifier
2076-3417
dc.identifier
https://hdl.handle.net/2117/386158
dc.identifier
10.3390/app122111177
dc.description.abstract
Hierarchical text classification consists of classifying text documents into a hierarchy of classes and sub-classes. Although Artificial Neural Networks have proved useful to perform this task, unfortunately, they can leak training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models, enabling the models to be shared safely at the cost of reduced model accuracy. This work investigates the privacy–utility trade-off in hierarchical text classification with differential privacy guarantees, and it identifies neural network architectures that offer superior trade-offs. To this end, we use a white-box membership inference attack to empirically assess the information leakage of three widely used neural network architectures. We show that large differential privacy parameters already suffice to completely mitigate membership inference attacks, thus resulting only in a moderate decrease in model utility. More specifically, for large datasets with long texts, we observed Transformer-based models to achieve an overall favorable privacy–utility trade-off, while for smaller datasets with shorter texts, convolutional neural networks are preferable.
dc.description.abstract
This work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 825333 (MOSAICROWN). The project that gave rise to these results received the support of a fellowship from “la Caixa” Foundation (ID 100010434) and from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 847648. The fellowship code is LCF/BQ/PR20/11770009. The work of Javier Parra-Arnau has been supported through an Alexander von Humboldt PostDoctoral Fellowship. This work was also supported by the Spanish Government under research project “Enhancing Communication Protocols with Machine Learning while Protecting Sensitive Data (COMPROMISE)” (PID2020-113795RBC31/AEI/10.13039/501100011033).
dc.description.abstract
Postprint (published version)
dc.format
21 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Multidisciplinary Digital Publishing Institute
dc.relation
https://www.mdpi.com/2076-3417/12/21/11177
dc.relation
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113795RB-C31/ES/COMPROMISE. PRIVACIDAD DE DATOS PARA REDES DE COMUNICACIONES Y BASES DE DATOS DINAMICAS/
dc.rights
http://creativecommons.org/licenses/by/4.0/
dc.rights
Open Access
dc.rights
Attribution 4.0 International
dc.subject
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Telemàtica i xarxes d'ordinadors
dc.subject
Neural networks (Computer science)
dc.subject
Machine learning
dc.subject
Text classification
dc.subject
Differential privacy
dc.subject
Membership inference
dc.subject
Aprenentatge automàtic
dc.subject
Xarxes neuronals (Informàtica)
dc.title
On the privacy–utility trade-off in differentially private hierarchical text classification
dc.type
Article


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [73034]