Linxi Jiang (The Ohio State University), Xin Jin (The Ohio State University), Zhiqiang Lin (The Ohio State University)

Function name inference in stripped binaries is an important yet challenging task for many security applications, such as malware analysis and vulnerability discovery, due to the need to grasp binary code semantics amidst diverse instruction sets, architectures, compiler optimizations, and obfuscations. While machine learning has made significant progress in this field, existing methods often struggle with unseen data, constrained by their reliance on a limited vocabulary-based classification approach. In this paper, we present SymGen, a novel framework employing an autoregressive generation paradigm powered by domain-adapted generative large language models (LLMs) for enhanced binary code interpretation. We have evaluated SymGen on a dataset comprising 2,237,915 binary functions across four architectures (x86-64, x86-32, ARM, MIPS) with four levels of optimizations (O0-O3) where it surpasses the state-of-the-art with up to 409.3%, 553.5%, and 489.4% advancement in precision, recall, and F1 score, respectively, showing superior effectiveness and generalizability. Our ablation and case studies also demonstrate the significant performance boosts achieved by our design, e.g., the domain adaptation approach, alongside showcasing SymGen’s practicality in analyzing real-world binaries, e.g., obfuscated binaries and malware executables.

View More Papers

Translating C To Rust: Lessons from a User Study

Ruishi Li (National University of Singapore), Bo Wang (National University of Singapore), Tianyu Li (National University of Singapore), Prateek Saxena (National University of Singapore), Ashish Kundu (Cisco Research)

Read More

Defending Against Membership Inference Attacks on Iteratively Pruned Deep...

Jing Shang (Beijing Jiaotong University), Jian Wang (Beijing Jiaotong University), Kailun Wang (Beijing Jiaotong University), Jiqiang Liu (Beijing Jiaotong University), Nan Jiang (Beijing University of Technology), Md Armanuzzaman (Northeastern University), Ziming Zhao (Northeastern University)

Read More

Truman: Constructing Device Behavior Models from OS Drivers to...

Zheyu Ma (Institute for Network Sciences and Cyberspace (INSC), Tsinghua University; EPFL; JCSS, Tsinghua University (INSC) - Science City (Guangzhou) Digital Technology Group Co., Ltd.), Qiang Liu (EPFL), Zheming Li (Institute for Network Sciences and Cyberspace (INSC), Tsinghua University; JCSS, Tsinghua University (INSC) - Science City (Guangzhou) Digital Technology Group Co., Ltd.), Tingting Yin (Zhongguancun…

Read More

ReDAN: An Empirical Study on Remote DoS Attacks against...

Xuewei Feng (Tsinghua University), Yuxiang Yang (Tsinghua University), Qi Li (Tsinghua University), Xingxiang Zhan (Zhongguancun Lab), Kun Sun (George Mason University), Ziqiang Wang (Southeast University), Ao Wang (Southeast University), Ganqiu Du (China Software Testing Center), Ke Xu (Tsinghua University)

Read More