Marius Vangeli (KTH Royal Institute of Technology, Sweden), Joel Brynielsson (KTH Royal Institute of Technology, Sweden and FOI Swedish Defence Research Agency, Sweden), Mika Cohen (KTH Royal Institute of Technology, Sweden and FOI Swedish Defence Research Agency, Sweden), Farzad Kamrani (FOI Swedish Defence Research Agency, Sweden)

While large language model (LLM)-driven penetration testing is rapidly improving, autonomous agents still struggle with longer-duration multi-stage exploits. As agents perform reconnaissance, attempt exploits, and pivot through systems, the token context window fills up with exploration and failed attempts, degrading decision quality. We introduce context handoff for autonomous penetration testing (CHAP), a context-relay system for LLM-driven agents. CHAP enables agents to sustain long-running penetration tests by transferring accumulated knowledge as compact protocols to fresh agent instances.

We evaluate CHAP on an extended version of the AutoPen- Bench benchmark, targeting 11 real-world vulnerabilities. CHAP improved per-run success from 27.3% to 36.4% while reducing token expenditure by 32.4% compared to a baseline agent. We release our full implementation, benchmark enhancements, and a dataset of command logs with LLM reasoning traces.

View More Papers

BINALIGNER: Aligning Binary Code for Cross-Compilation Environment Diffing

Yiran Zhu (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Tong Tang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Jie Wan (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Ziqi Yang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University; Hangzhou High-Tech Zone…

Read More

Anchors of Trust: A Usability Study on User Awareness,...

Xin Zhang (Fudan University), Xiaohan Zhang (Fudan University), Huijun Zhou (Fudan University), Bo Zhao (Fudan University)

Read More

Decompiling the Synergy: An Empirical Study of Human–LLM Teaming...

Zion Leonahenahe Basque (Arizona State University), Samuele Doria (University of Padua), Ananta Soneji (Arizona State University), Wil Gibbs (Arizona State University), Adam Doupe (Arizona State University), Yan Shoshitaishvili (Arizona State University), Eleonora Losiouk (University of Padua), Ruoyu “Fish” Wang (Arizona State University), Simone Aonzo (EURECOM)

Read More