Cache Me, Catch You: Cache Related Security Threats in LLM Serving Frameworks

XiangFan Wu (Ocean University of China; QI-ANXIN Technology Research Institute), Lingyun Ying (QI-ANXIN Technology Research Institute), Guoqiang Chen (QI-ANXIN Technology Research Institute), Yacong Gu (Tsinghua University; Tsinghua University-QI-ANXIN Group JCNS), Haipeng Qu (Department of Computer Science and Technology, Ocean University of China)

Large Language Models (LLMs) are rapidly reshaping digital interactions. Their performance and efficiency are critically dependent on advanced caching mechanisms, such as prefix caching and semantic caching.
However, these mechanisms introduce a new attack surface. Unlike prior work focused on LLMs poisoning attacks during the training phase, this paper presents the first comprehensive investigation into cache-related security risks that arise during the LLM inference-time.

We conducted a systematic study of the cache implementations in mainstream LLM serving frameworks and then identified six novel attack vectors categorized as: (1) User-oriented Fraud Attacks, which manipulate cache entries to deliver malicious content to users via prefix cache collisions and semantic fuzzy poisoning; and (2) System Integrity Attacks, which exploit cache vulnerabilities to bypass security checks, such as using block-wise or multimodal collisions to evade content moderation.
Our experiments on leading open-source frameworks validated these attack vectors and evaluated their impact and cost.
Furthermore, we proposed five multilayer defense strategies and assessed their effectiveness.
We responsibly disclosed our findings to affected vendors, including vLLM, SGLang, GPTCache, AIBrix, rtp-llm and LMDeploy. All of them have acknowledged the vulnerabilities, and notably, vLLM, GPTCache, and AIBrix have adopted our proposed mitigation methods and fixed their vulnerabilities.
Our findings underscore the importance of secure the caching infrastructure in the rapidly expanding LLM ecosystem.

Paper

View More Papers

Pitfalls for Security Isolation in Multi-CPU Systems

Simeon Hoffmann (CISPA Helmholtz Center for Information Security), Nils Ole Tippenhauer (CISPA Helmholtz Center for Information Security)

DOM-XSS Detection via Webpage Interaction Fuzzing and URL Component...

Nuno Sabino (Carnegie Mellon University, Instituto Superior Técnico, Universidade de Lisboa, and Instituto de Telecomunicações), Darion Cassel (Carnegie Mellon University), Rui Abreu (Universidade do Porto, INESC-ID), Pedro Adão (Instituto Superior Técnico, Universidade de Lisboa, and Instituto de Telecomunicações), Lujo Bauer (Carnegie Mellon University), Limin Jia (Carnegie Mellon University)

From Underground to Mainstream Marketplaces: Measuring AI-Enabled NSFW Deepfakes...

Mohamed Moustafa Dawoud (University of California, Santa Cruz), Alejandro Cuevas (Princeton University), Ram Sundara Raman (University of California, Santa Cruz)