Zhenhao Luo (College of Computer, National University of Defense Technology), Pengfei Wang (College of Computer, National University of Defense Technology), Baosheng Wang (College of Computer, National University of Defense Technology), Yong Tang (College of Computer, National University of Defense Technology), Wei Xie (College of Computer, National University of Defense Technology), Xu Zhou (College of Computer, National University of Defense Technology), Danjun Liu (College of Computer, National University of Defense Technology), Kai Lu (College of Computer, National University of Defense Technology)

Code reuse is widespread in software development. It brings a heavy spread of vulnerabilities, threatening software security. Unfortunately, with the development and deployment of the Internet of Things (IoT), the harms of code reuse are magnified. Binary code search is a viable way to find these hidden vulnerabilities. Facing IoT firmware images compiled by different compilers with different optimization levels from different architectures, the existing methods are hard to fit these complex scenarios. In this paper, we propose a novel intermediate representation function model, which is an architecture-agnostic model for cross-architecture binary code search. It lifts binary code into microcode and preserves the main semantics of binary functions via complementing implicit operands and pruning redundant instructions. Then, we use natural language processing techniques and graph convolutional networks to generate function embeddings. We call the combination of a compiler, architecture, and optimization level as a file environment, and take a divide-and-conquer strategy to divide a similarity calculation problem of $C_N^2$ cross-file-environment scenarios into N-1 embedding transferring sub-problems. We propose an entropy-based adapter to transfer function embeddings from different file environments into the same file environment to alleviate the differences caused by various file environments. To precisely identify vulnerable functions, we propose a progressive search strategy to supplement function embeddings with fine-grained features to reduce false positives caused by patched functions. We implement a prototype named VulHawk and conduct experiments under seven different tasks to evaluate its performance and robustness. The experiments show VulHawk outperforms Asm2Vec, Asteria, BinDiff, GMN, PalmTree, SAFE, and Trex.

View More Papers

DARWIN: Survival of the Fittest Fuzzing Mutators

Patrick Jauernig (Technical University of Darmstadt), Domagoj Jakobovic (University of Zagreb, Croatia), Stjepan Picek (Radboud University and TU Delft), Emmanuel Stapf (Technical University of Darmstadt), Ahmad-Reza Sadeghi (Technical University of Darmstadt)

Read More

The Walls Have Ears: Gauging Security Awareness in a...

Gokul Jayakrishnan, Vijayanand Banahatti, Sachin Lodha (TCS Research Tata Consultancy Services Ltd.)

Read More

How to Count Bots in Longitudinal Datasets of IP...

Leon Böck (Technische Universität Darmstadt), Dave Levin (University of Maryland), Ramakrishna Padmanabhan (CAIDA), Christian Doerr (Hasso Plattner Institute), Max Mühlhäuser (Technical University of Darmstadt)

Read More

Drone Security and the Mysterious Case of DJI's DroneID

Nico Schiller (Ruhr-Universität Bochum), Merlin Chlosta (CISPA Helmholtz Center for Information Security), Moritz Schloegel (Ruhr-Universität Bochum), Nils Bars (Ruhr University Bochum), Thorsten Eisenhofer (Ruhr University Bochum), Tobias Scharnowski (Ruhr-University Bochum), Felix Domke (Independent), Lea Schönherr (CISPA Helmholtz Center for Information Security), Thorsten Holz (CISPA Helmholtz Center for Information Security)

Read More