Jinghan Zhang (University of Virginia), Sharon Biju (University of Virginia), Saleha Muzammil (University of Virginia), Wajih Ul Hassan (University of Virginia)
We present MIRAGE, the first privacy-preserving Provenance-based IDS (PIDS) that integrates Federated Learning (FL) with graph representation learning to match centralized detection accuracy while preserving privacy and improving scalability. Building MIRAGE is non-trivial due to challenges in federating graph-based models across clients with heterogeneous logs, inconsistent semantic encodings, and temporally misaligned data. To address these challenges, MIRAGE introduces a novel process entity categorization-based ensemble, where specialized submodels learn distinct system behaviors and avoid aggregation errors. To enable privacy-preserving semantic alignment, MIRAGE designs a dual-server harmonization framework: one server issues encryption keys, and the other aggregates encrypted embeddings without accessing sensitive tokens. To remain robust to temporal misalignment across clients, MIRAGE employs inductive GNNs that eliminate the need for synchronized timestamps. Evaluations on DARPA datasets show that MIRAGE matches the detection accuracy of state-of-the-art PIDS and reduces network communication costs by 170×, processes datasets in minutes rather than hours.