Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models

Linzhi Chen (ShanghaiTech University), Yang Sun (Independent Researcher), Hongru Wei (ShanghaiTech University), Yuqi Chen (ShanghaiTech University)

Low-Rank Adaptation (LoRA) has emerged as an efficient method for fine-tuning large language models (LLMs) and is widely adopted within the open-source community. However, the decentralized dissemination of LoRA adapters through platforms such as Hugging Face introduces novel security vulnerabilities: malicious adapters can be easily distributed and evade conventional oversight mechanisms. Despite these risks, backdoor attacks targeting LoRA-based fine-tuning remain relatively underexplored. Existing backdoor attack strategies are ill-suited to this setting, as they often rely on inaccessible training data, fail to account for the structural properties unique to LoRA, or suffer from high false trigger rates (FTR), thereby compromising their stealth.
To address these challenges, we propose Causal-Guided Detoxify Backdoor Attack (CBA), a novel backdoor attack framework specifically designed for open-weight LoRA models. CBA operates without access to original training data and achieves high stealth through two key innovations: (1) a coverage-guided data generation pipeline that synthesizes task-aligned inputs via behavioral exploration, and (2) a causal-guided detoxification strategy that merges poisoned and clean adapters by preserving task-critical neurons.
Unlike prior approaches, CBA enables post-training control over attack intensity through causal influence-based weight allocation, eliminating the need for repeated retraining. Evaluated across six LoRA models, CBA achieves high attack success rates while reducing FTR by 50–70% compared to baseline methods. Furthermore, it demonstrates enhanced resistance to state-of-the-art backdoor defenses, highlighting its stealth and robustness.

Paper

View More Papers

Enabling Research Extensions in Matter via Custom Clusters

Ravindra Mangar (Dartmouth College, Hanover), Jared Chandler (Dartmouth College, Hanover), Timothy J. Pierson (Dartmouth College, Hanover), David Kotz (Dartmouth College, Hanover)

SysArmor: The Practice of Integrating Provenance Analysis into Endpoint...

Shaofei Li (Peking University), Jiandong Jin (Peking University), Hanlin Jiang (Peking University), Yi Huang (Peking University), Yifei Bao (Jilin University), Yuhan Meng (Peking University), Fengwei Hong (Peking University), Zheng Huang (Peking University), Peng Jiang (Southeast University), Ding Li (Peking University)

VulSCA: A Community-Level SCA Approach for Accurate C/C++ Supply...

Yutao Hu (Huazhong University of Science and Technology), Chaofan Li (Huazhong University of Science and Technology), Yueming Wu (Huazhong University of Science and Technology), Yifeng Cai (Peking University), Deqing Zou (Huazhong University of Science and Technology)