Jef Jacobs (DistriNet, KU Leuven), Jorn Lapon (DistriNet, KU Leuven), Vincent Naessens (DistriNet, KU Leuven)

Large Language Models (LLMs) are increasingly used as autonomous agents in domains such as cybersecurity and system administration. The performance of these agents depends heavily on their ability to interact effectively with operating systems, often through Bash commands. Current implementations primarily rely on proprietary cloud-based models, which raise privacy and data confidentiality concerns when deployed in real-world environments. Locally hosted open-source LLMs offer a promising alternative, but their performance for such tasks remains unclear.

This paper presents an empirical evaluation of 22 opensource language models (ranging from 1B to 32B parameters) on Natural Language–to–Bash translation tasks. We introduce an improved scoring system for assessing task success and analyze performance under 10 distinct prompting techniques. Our findings show that Qwen3 models achieve strong results in NL2Bash tasks, that role-play prompting significantly benefits most models, and Chain-of-Thought and RAG can surprisingly hurt local model performance if not carefully designed. We further observe that the impact of prompting strategies varies with model size.

View More Papers

Practical Traceable Over-Threshold Multi-Party Private Set Intersection

Le Yang (School of Cyber Science and Technology, University of Science and Technology of China), Weijing You (Fujian Provincial Key Laboratory of Network Security and Cryptology, College of Computer and Cyber Security, Fujian Normal University), Huiyang He (School of Cyber Science and Technology, University of Science and Technology of China), Kailiang Ji (NIO Inc), Jingqiang…

Read More

Hiding an Ear in Plain Sight: On the Practicality...

Youqian Zhang (The Hong Kong Polytechnic University), Zheng Fang (The Hong Kong Polytechnic University), Huan Wu (The Hong Kong Polytechnic University & Technological and Higher Education Institute of Hong Kong), Sze Yiu Chau (The Chinese University of Hong Kong), Chao Lu (The Hong Kong Polytechnic University), Xiapu Luo (The Hong Kong Polytechnic University)

Read More

HyperMirage: Direct State Manipulation in Hybrid Virtual CPU Fuzzing

Manuel Andreas (Technical University of Munich), Fabian Specht (Technical University of Munich), Marius Momeu (Technical University of Munich)

Read More