Yun Zhang (Hunan University), Yuling Liu (Hunan University), Ge Cheng (Xiangtan University), Bo Ou (Hunan University)

In the field of computer security, binary code similarity detection is a crucial for identifying malicious software, copyright infringement, and software vulnerabilities. However, obfuscation techniques not only changes the structure and features of the code but also effectively conceal its potential malicious behaviors or infringing nature, thereby increasing the complexity of detection. Although methods based on graph neural networks have become the forefront technology for solving code similarity detection due to their effective processing and representation of code structures, they have limitations in dealing with obfuscated function matching, especially in scenarios involving control flow obfuscation. This paper proposes a method based on Graph Transformers aimed at improving the accuracy and efficiency of obfuscation-resilient binary code similarity detection. Our method utilizes Transformers to extract global information and employs three different encodings to determine the relative importance or influence of nodes in the CFG, the relative position between nodes, and the hierarchical relationships within the CFG. This method demonstrates significant adaptability to various obfuscation techniques and exhibits enhanced robustness and scalability when processing large datasets.

View More Papers

How Different Tokenization Algorithms Impact LLMs and Transformer Models...

Ahmed Mostafa, Raisul Arefin Nahid, Samuel Mulder (Auburn University)

Read More

The impact of data-heavy, post-quantum TLS 1.3 on the...

Panos Kampanakis and Will Childs-Klein (AWS)

Read More

QSynth – A Program Synthesis based approach for Binary...

Robin David (Quarkslab), Luigi Coniglio (University of Trento), Mariano Ceccato (University of Verona)

Read More