On the privacy–utility trade-off in differentially private hierarchical text classification

Wunderlich, Dominik; Bernau, Daniel; Aldà, Francesco; Parra Arnau, Javier; Strufe, Thorsten

On the privacy–utility trade-off in differentially private hierarchical text classification

dc.contributor

Universitat Politècnica de Catalunya. Departament d'Enginyeria Telemàtica

dc.contributor

Universitat Politècnica de Catalunya. SISCOM - Smart Services for Information Systems and Communication Networks

dc.contributor.author

Wunderlich, Dominik

dc.contributor.author

Bernau, Daniel

dc.contributor.author

Aldà, Francesco

dc.contributor.author

Parra Arnau, Javier

dc.contributor.author

Strufe, Thorsten

dc.date.issued

2022-11-04

dc.identifier

Wunderlich, D. [et al.]. On the privacy–utility trade-off in differentially private hierarchical text classification. "Applied sciences (Basel)", 4 Novembre 2022, vol. 12, núm. 11177, p. 1-21.

dc.identifier

2076-3417

dc.identifier

https://hdl.handle.net/2117/386158

dc.identifier

10.3390/app122111177

dc.description.abstract

Hierarchical text classification consists of classifying text documents into a hierarchy of classes and sub-classes. Although Artificial Neural Networks have proved useful to perform this task, unfortunately, they can leak training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models, enabling the models to be shared safely at the cost of reduced model accuracy. This work investigates the privacy–utility trade-off in hierarchical text classification with differential privacy guarantees, and it identifies neural network architectures that offer superior trade-offs. To this end, we use a white-box membership inference attack to empirically assess the information leakage of three widely used neural network architectures. We show that large differential privacy parameters already suffice to completely mitigate membership inference attacks, thus resulting only in a moderate decrease in model utility. More specifically, for large datasets with long texts, we observed Transformer-based models to achieve an overall favorable privacy–utility trade-off, while for smaller datasets with shorter texts, convolutional neural networks are preferable.

dc.description.abstract

This work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 825333 (MOSAICROWN). The project that gave rise to these results received the support of a fellowship from “la Caixa” Foundation (ID 100010434) and from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 847648. The fellowship code is LCF/BQ/PR20/11770009. The work of Javier Parra-Arnau has been supported through an Alexander von Humboldt PostDoctoral Fellowship. This work was also supported by the Spanish Government under research project “Enhancing Communication Protocols with Machine Learning while Protecting Sensitive Data (COMPROMISE)” (PID2020-113795RBC31/AEI/10.13039/501100011033).

dc.description.abstract

Postprint (published version)

dc.format

21 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Multidisciplinary Digital Publishing Institute

dc.relation

https://www.mdpi.com/2076-3417/12/21/11177

dc.relation

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113795RB-C31/ES/COMPROMISE. PRIVACIDAD DE DATOS PARA REDES DE COMUNICACIONES Y BASES DE DATOS DINAMICAS/

dc.rights

http://creativecommons.org/licenses/by/4.0/

dc.rights

Open Access

dc.rights

Attribution 4.0 International

dc.subject

Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Telemàtica i xarxes d'ordinadors

dc.subject

Neural networks (Computer science)

dc.subject

Machine learning

dc.subject

Text classification

dc.subject

Differential privacy

dc.subject

Membership inference

dc.subject

Aprenentatge automàtic

dc.subject

Xarxes neuronals (Informàtica)

dc.title

On the privacy–utility trade-off in differentially private hierarchical text classification

dc.type

Article

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [73034]