Peiwei Hu (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China), Ruigang Liang (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China), Kai Chen (Institute of Information Engineering, Chinese Academy of Sciences, China)

Reverse engineering is essential in malware analysis, vulnerability discovery, etc. Decompilers assist the reverse engineers by lifting the assembly to the high-level programming language, which highly boosts binary comprehension. However, decompilers suffer from problems such as meaningless variable names, redundant variables, and lacking comments describing the purpose of the code. Previous studies have shown promising performance in refining the decompiler output by training the models with huge datasets containing various decompiler outputs. However, even datasets that take much time to construct cover limited binaries in the real world. The performance degrades severely facing the binary migration.

In this paper, we present DeGPT, an end-to-end framework aiming to optimize the decompiler output to improve its readability and simplicity and further assist the reverse engineers in understanding the binaries better. The Large Language Model (LLM) can mitigate performance degradation with its extraordinary ability endowed by large model size and training set containing rich multi-modal data. However, its potential is difficult to unlock through one-shot use. Thus, we propose the three-role mechanism, which includes referee (R_ref), advisor (R_adv), and operator (R_ope), to adapt the LLM to our optimization tasks. Specifically, R_ref provides the optimization scheme for the target decompiler output, while R_adv gives the rectification measures based on the scheme, and R_ope inspects whether the optimization changes the original function semantics and concludes the final verdict about whether to accept the optimizations. We evaluate DeGPT on the datasets containing decompiler outputs of various software, such as the practical command line tools, malware, a library for audio processing, and implementations of algorithms. The experimental results show that even on the output of the current top-level decompiler (Ghidra), DeGPT can achieve 24.4% reduction in the cognitive burden of understanding the decompiler outputs and provide comments of which 62.9% can provide practical semantics for the reverse engineers to help the understanding of binaries. Our user surveys also show that the optimizations can significantly simplify the code and add helpful semantic information (variable names and comments), facilitating a quick and accurate understanding of the binary.

View More Papers

coucouArray ( [post_type] => ndss-paper [post_status] => publish [posts_per_page] => 4 [orderby] => rand [tax_query] => Array ( [0] => Array ( [taxonomy] => category [field] => id [terms] => Array ( [0] => 104 ) ) ) [post__not_in] => Array ( [0] => 16906 ) )

LibAFL QEMU: A Library for Fuzzing-oriented Emulation

Romain Malmain (EURECOM), Andrea Fioraldi (EURECOM), Aurelien Francillon (EURECOM)

Read More

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

Linkang Du (Zhejiang University), Min Chen (CISPA Helmholtz Center for Information Security), Mingyang Sun (Zhejiang University), Shouling Ji (Zhejiang University), Peng Cheng (Zhejiang University), Jiming Chen (Zhejiang University), Zhikun Zhang (CISPA Helmholtz Center for Information Security and Stanford University)

Read More

Facilitating Non-Intrusive In-Vivo Firmware Testing with Stateless Instrumentation

Jiameng Shi (University of Georgia), Wenqiang Li (Independent Researcher), Wenwen Wang (University of Georgia), Le Guan (University of Georgia)

Read More

Compromising Industrial Processes using Web-Based Programmable Logic Controller Malware

Ryan Pickren (Georgia Institute of Technology), Tohid Shekari (Georgia Institute of Technology), Saman Zonouz (Georgia Institute of Technology), Raheem Beyah (Georgia Institute of Technology)

Read More

Privacy Starts with UI: Privacy Patterns and Designer Perspectives in UI/UX Practice

Anxhela Maloku (Technical University of Munich), Alexandra Klymenko (Technical University of Munich), Stephen Meisenbacher (Technical University of Munich), Florian Matthes (Technical University of Munich)

Vision: Profiling Human Attackers: Personality and Behavioral Patterns in Deceptive Multi-Stage CTF Challenges

Khalid Alasiri (School of Computing and Augmented Intelligence Arizona State University), Rakibul Hasan (School of Computing and Augmented Intelligence Arizona State University)

From Underground to Mainstream Marketplaces: Measuring AI-Enabled NSFW Deepfakes on Fiverr

Mohamed Moustafa Dawoud (University of California, Santa Cruz), Alejandro Cuevas (Princeton University), Ram Sundara Raman (University of California, Santa Cruz)