Building Next-Generation Datasets for Provenance-Based Intrusion Detection

Qizhi Cai (Zhejiang University), Lingzhi Wang (Northwestern University), Yao Zhu (Zhejiang University), Zhipeng Chen (Zhejiang University), Xiangmin Shen (Hofstra University), Zhenyuan Li (Zhejiang University)

In recent years, provenance-based intrusion detection and forensic systems have attracted significant attention, leading to a rapid growth of related research efforts. However, progress in this area has been hindered by the long-standing lack of updated datasets and benchmarks. Existing datasets suffer from several critical limitations, including outdated attack techniques, short temporal scales, and incomplete or fragmented attack chains. As a result, they fail to capture the characteristics of the latest, real-world Advanced Persistent Threat (APT) attacks. Moreover, the unclear, coarse-grained attack procedures underlying existing datasets make accurate labeling and reliable evaluation difficult. Consequently, the absence of a comprehensive, up-to-date dataset has become a major bottleneck for the progress of this area. To address this, we present our efforts in building a large-scale, diverse, and well-annotated dataset for provenance-based intrusion analysis. Our dataset is generated using an automated attack emulation framework that incorporates recent attack techniques and supports fine-grained ground-truth labeling. Using this dataset, we conduct a comprehensive evaluation of state-of-the-art provenance-based intrusion detection systems, revealing weaknesses that cannot be effectively benchmarked with existing datasets. Our results demonstrate the dataset’s value in enabling clearer, more informative evaluations and highlight its potential to advance future research in provenance-based intrusion detection and graph-based security analysis.

Paper

View More Papers

MinBucket MPSI: Breaking the Max-Size Bottleneck in Multi-Party Private...

Binbin Tu (School of Cyber Science and Technology, Shandong University; State Key Laboratory of Cryptography and Digital Economy Security, Shandong University), Boyudong Zhu (School of Cyber Science and Technology, Shandong University; State Key Laboratory of Cryptography and Digital Economy Security, Shandong University), Yang Cao (School of Cyber Science and Technology, Shandong University; State Key Laboratory…

Peering Inside the Black-Box: Long-Range and Scalable Model Architecture...

Rui Xiao (Zhejiang University), Sibo Feng (Zhejiang University), Soundarya Ramesh (National University of Singapore), Jun Han (KAIST), Jinsong Han (Zhejiang University)

FirmCross: Detecting Taint-style Vulnerabilities in Modern C-Lua Hybrid Web...

Runhao Liu (National University of Defense Technology), Jiarun Dai (Fudan University), Haoyu Xiao (Fudan University), Yuan Zhang (Fudan University), Yeqi Mou (National University of Defense Technology), Lukai Xu (National University of Defense Technology), Bo Yu (National University of Defense Technology), Baosheng Wang (National University of Defense Technology), Min Yang (Fudan University)