Utility of Semantic Privacy Notions for Correlated Data

Other authors

Universitat Politècnica de Catalunya. Departament d'Enginyeria Telemàtica

Karlsruher Institut für Technologie

Parra Arnau, Javier

Guerra-Balboa, Patricia

Publication date

2024-07-04

Abstract

In research on data privacy, differential privacy (DP) is the method of choice to measure privacy leakage while conducting data analysis. A DP algorithm is an algorithm that limits the privacy leakage when its result is released publicly. However, privacy attacks are possible against DP algorithms if the underlying data is correlated -- even if the privacy leakage according to DP is low. As a response, researchers have proposed various alternative semantic privacy notions that measure privacy leakage while taking correlation into account. However, increasingly strict privacy measurements call into question whether this allows for sufficient utility for data analysis algorithms. The theoretical limits on their utility remain unclear. Therefore, this thesis investigates the utility of algorithms that limit privacy leakage according to Bayesian differential privacy (BDP), a common extension of DP. We directly prove statements about the utility of BDP algorithms, as well as how much the privacy leakage of a DP algorithm increases when measuring it with BDP. We present novel theoretical results for various correlation models of the data: arbitrary correlation, genomic data, multivariate Gaussian correlation and data following a Markov chain. We find that BDP algorithms cannot provide good utility when the correlation model is completely arbitrary. However, we also show that the increase in measured privacy leakage from DP to BDP is highly dependent on the specific correlation model. These results enable the creation of BDP algorithms with high utility for correlation models where the increase in privacy leakage is relatively small. For example, high utility BDP algorithms on data from a multivariate Gaussian correlation model are possible if the strength of correlation is sufficiently weak -- even if all records in the data are correlated with each other. Additionally, we empirically test the utility of BDP algorithms on real-world data sets and find improvements over the state of the art.

Document Type

Master thesis

Language

English

Publisher

Universitat Politècnica de Catalunya

Recommended citation

This citation was generated automatically.

Rights

S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'

Open Access

This item appears in the following Collection(s)