Was My Data Used for Training? Membership Inference in Open-Source LLMs via Neural Activations

Xue Tan (Fudan University), Hao Luan (Fudan University), Mingyu Luo (Fudan University), Zhuyang Yu (Fudan University), Jun Dai (Worcester Polytechnic Institute), Xiaoyan Sun (Worcester Polytechnic Institute), Ping Chen (Fudan University)

With the rapid development of Large Language Models (LLMs), their applications have expanded across various aspects of daily life. Open-source LLMs, in particular, have gained popularity due to their accessibility, resulting in widespread downloading and redistribution. The impressive capabilities of LLMs results from training on massive and often undisclosed datasets. This raises the question of whether sensitive content such as copyrighted or personal data is included, which is known as the membership inference problem. Existing methods mainly rely on model outputs and overlook rich internal representations. Limited access to internal data leads to suboptimal results, revealing a research gap for membership inference in open-source white-box LLMs.

In this paper, we address the challenge of detecting the training data of open-source LLMs. To support this investigation, we introduce three dynamic benchmarks: textit{WikiTection}, textit{NewsTection}, and textit{ArXivTection}. We then propose a white-box approach for training data detection by analyzing neural activations of LLMs. Our key insight is that the neuron activations across all layers of LLM reflect the internal representation of knowledge related to the input data within the LLM, which can effectively distinguish between training data and non-training data of LLM. Extensive experiments on these benchmarks demonstrate the strong effectiveness of our approach. For instance, on the textit{WikiTection} benchmark, our method achieves an AUC of around 0.98 across five LLMs: textit{GPT2-xl}, textit{LLaMA2-7B}, textit{LLaMA3-8B}, textit{Mistral-7B}, and textit{LLaMA2-13B}. Additionally, we conducted in-depth analysis on factors such as model size, input length, and text paraphrasing, further validating the robustness and adaptability of our method.

Paper

View More Papers

PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning

Chen GONG (University of Virginia), Zheng Liu (University of Virginia), Kecen Li (University of Virginia), Tianhao Wang (University of Virginia)

User-Space Dependency-Aware Rehosting for Linux-Based Firmware Binaries

Chuan Qin (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Nanyang Technological University), Cen Zhang (Nanyang Technological University), Yaowen Zheng (Institute of Information Engineering, Chinese Acadamy of Sciences), Puzhuo Liu (Ant Group; Tsinghua University), Jian Zhang (Nanyang Technological University), Yeting Li (Institute of Information Engineering,…

CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient...

Saisai Xia (Institute of Information Engineering, CAS), Wenhao Wang (Institute of Information Engineering, CAS), Zihao Wang (Nanyang Technological University (NTU)), Yuhui Zhang (Institute of Information Engineering, CAS), Yier Jin (University of Science and Technology of China), Dan Meng (Institute of Information Engineering, CAS), Rui Hou (Institute of Information Engineering, CAS)