Jiongchi Yu (Singapore Management University), Xiaofei Xie (Singapore Management University), Qiang Hu (Tianjin University), Yuhan Ma (Tianjin University), Ziming Zhao (Zhejiang University)

Insider threat, which can lead to unacceptable losses, is a widespread and significant security concern, making its detection essential. Recently, machine learning based insider threat detection (ITD) methods have been proposed with promising results. Despite this success, a major challenge, the lack of sufficient data, limits the further development of these ITD methods. The paradox is that enterprise internal data is highly sensitive and typically inaccessible, while public datasets are either limited in real-world coverage or, in the case of synthetic data, lack rich semantic information and realistic behavioral patterns. As a result, there is a crucial need for the construction of real-world insider threat datasets.

To address this challenge, we propose Chimera, the first large language model (LLM)-based multi-agent framework to automatically simulate both benign and malicious insider activities, as well as collect logs across diverse enterprise environments. Based on analysis of organizational composition and structural characteristics of the organization, Chimera customizes each LLM agent to represent an individual employee by detailed role modeling and couples with modules of group meetings, pairwise interactions, and self-organized scheduling. In this way, Chimera can reflect the complexities of real-world enterprise operations accurately. The current version of Chimera consists of 15 distinct types of manually abstracted insider attacks, such as intellectual property theft and system sabotage. Using Chimera, we simulate the benign and attack activities across three typical data-sensitive organizational scenarios, including technology company, finance corporation, and medical institution, and generate a new dataset named ChimeraLog to facilitate the development of machine learning-based ITD methods.

To evaluate the quality and authenticity of ChimeraLog, we conduct comprehensive human studies and quantitative analyses. The results demonstrate both the diversity and realism of the dataset. Further expert analysis highlights the presence of realistic threat patterns as well as explainable activity traces. In addition, we evaluate the effectiveness of existing insider threat detection methods on ChimeraLog. The average F1-score achieved is 0.83, which is notably lower than the score of 0.99 observed on the baseline dataset CERT, thereby illustrating the greater difficulty posed by ChimeraLog for threat detection tasks.

View More Papers

BKPIR: Keyword PIR for Private Boolean Retrieval

Jie Song (Institute of Information Engineering, Chinese Academy of Sciences; Intelligent Policing Key Laboratory of Sichuan Province, Sichuan Police College; School of Cyber Security, University of Chinese Academy of Sciences), Zhen Xu (Institute of Information Engineering, Chinese Academy of Sciences), Yan Zhang (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University…

Read More

Constructive Noise Defeats Adversarial Noise: Adversarial Example Detection for...

Meng Shen (Beijing Institute of Technology), Jiangyuan Bi (Beijing Institute of Technology), Hao Yu (National University of Defense Technology), Zhenming Bai (Beijing Institute of Technology), Wei Wang (Xi'an Jiaotong University), Liehuang Zhu (Beijing Institute of Technology)

Read More

CoordMail: Exploiting SMTP Timeout and Command Interaction to Coordinate...

Ruixuan Li (Tsinghua University), Chaoyi Lu (Zhongguancun Laboratory), Baojun Liu (Tsinghua University), Yanzhong Lin (Coremail Technology Co. Ltd), Qingfeng Pan (Coremail Technology Co. Ltd), Jun Shao (Zhejiang Gongshang University; Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology)

Read More