To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

Meenatchi Sundaram Muthu Selva Annamalai (University College London), Borja Balle (Google Deepmind), Jamie Hayes (Deepmind), Emiliano De Cristofaro (UC Riverside)

The Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm supports the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. Traditionally, DP-SGD processes training data in batches using Poisson subsampling to select each batch at every iteration. More recently, shuffling has become a common alternative due to its better compatibility and lower computational overhead. However, computing tight theoretical DP guarantees under shuffling remains an open problem. As a result, models trained with shuffling are often evaluated as if Poisson subsampling were used, which might result in incorrect privacy guarantees.

This raises a compelling research question: can we verify whether there are gaps between the theoretical DP guarantees reported by state-of-the-art models using shuffling and their actual leakage? To do so, we define novel DP-auditing procedures to analyze DP-SGD with shuffling and measure their ability to tightly estimate privacy leakage vis-`a-vis batch sizes, privacy budgets, and threat models. Overall, we demonstrate that DP models trained using this approach have considerably overestimated their privacy guarantees (by up to 4 times). However, we also find that the gap between the theoretical Poisson DP guarantees and the actual privacy leakage from shuffling is not uniform across all parameter settings and threat models. Finally, we study two common variations of the shuffling procedure that result in even further privacy leakage (up to 10 times). Overall, our work highlights the risk of using shuffling instead of Poisson subsampling in the absence of rigorous analysis methods.

Paper

View More Papers

CHAMELEOSCAN: Demystifying and Detecting iOS Chameleon Apps via LLM-Powered...

Hongyu Lin (Zhejiang University), Yicheng Hu (Zhejiang University), Haitao Xu (Zhejiang University), Yanchen Lu (Zhejiang University), Mengxia Ren (Zhejiang University), Shuai Hao (Old Dominion University), Chuan Yue (Colorado School of Mines), Zhao Li (Hangzhou Yugu Technology), Fan Zhang (Zhejiang University), Yixin Jiang (Electric Power Research Institute)

Cross-Cache Attacks for the Linux Kernel via PCP Massaging

Claudio Migliorelli (IBM Research Europe - Zurich), Andrea Mambretti (IBM Research Europe - Zurich), Alessandro Sorniotti (IBM Research Europe - Zurich), Vittorio Zaccaria (Politecnico di Milano), Anil Kurmus (IBM Research Europe - Zurich)

Analysing Privacy Risks in Children’s Educational Apps in Australia

Sicheng Jin (University of New South Wales), Rahat Masood (University of New South Wales), Jung-Sook Lee (University of New South Wales), Hye-Young (Helen) Paik (University of New South Wales)