Imputation under Differential Privacy

Soumojit Das; Jorg Drechsler; Keith Merrill; Shawn Merrill

doi:10.48550/arXiv.2206.15063

Back

Preprint

Imputation under Differential Privacy

Soumojit Das, Jorg Drechsler, Keith Merrill and Shawn Merrill

arXiv

06/30/2022

DOI: https://doi.org/10.48550/arXiv.2206.15063

Abstract

Computer Science - Databases

The literature on differential privacy almost invariably assumes that the data to be analyzed are fully observed. In most practical applications this is an unrealistic assumption. A popular strategy to address this problem is imputation, in which missing values are replaced by estimated values given the observed data. In this paper we evaluate various approaches to answering queries on an imputed dataset in a differentially private manner, as well as discuss trade-offs as to where along the pipeline privacy is considered. We show that if imputation is done without consideration to privacy, the sensitivity of certain queries can increase linearly with the number of incomplete records. On the other hand, for a general class of imputation strategies, these worst case scenarios can be greatly reduced by ensuring privacy already during the imputation stage. We use a simulated dataset to demonstrate these results across a number of imputation schemes (both private and non-private) and examine their impact on the utility of a private query on the data.

Metrics

56 Record Views

Details

Title: Imputation under Differential Privacy
Creators: Soumojit Das
Jorg Drechsler
Keith Merrill
Shawn Merrill
Publisher: arXiv
Identifiers: 9924149260601921
Academic Unit: Department of Mathematics
Language: English
Resource Type: Preprint

Imputation under Differential Privacy

Abstract

Metrics

Details

Brandeis University Social media