Hojjat Aghakhani (University of California, Santa Barbara), Fabio Gritti (University of California, Santa Barbara), Francesco Mecca (Università degli Studi di Torino), Martina Lindorfer (TU Wien), Stefano Ortolani (Lastline Inc.), Davide Balzarotti (Eurecom), Giovanni Vigna (University of California, Santa Barbara), Christopher Kruegel (University of California, Santa Barbara)

Machine learning techniques are widely used in addition to signatures and heuristics to increase the detection rate of anti-malware software, as they automate the creation of detection models, making it possible to handle an ever-increasing number of new malware samples. In order to foil the analysis of anti-malware systems and evade detection, malware uses packing and other forms of obfuscation. However, few realize that benign applications use packing and obfuscation as well, to protect intellectual property and prevent license abuse.

In this paper, we study how machine learning based on static analysis features operates on packed samples. Malware researchers have often assumed that packing would prevent machine learning techniques from building effective classifiers. However, both industry and academia have published results that show that machine-learning-based classifiers can achieve good detection rates, leading many experts to think that classifiers are simply detecting the fact that a sample is packed, as packing is more prevalent in malicious samples. We show that, different from what is commonly assumed, packers do preserve some information when packing programs that is “useful” for malware classification. However, this information does not necessarily capture the sample’s behavior. We demonstrate that the signals extracted from packed executables are not rich enough for machine-learning-based models to (1) generalize their knowledge to operate on unseen packers, and (2) be robust against adversarial examples. We also show that a naïve application of machine learning techniques results in a substantial number of false positives, which, in turn, might have resulted in incorrect labeling of ground-truth data used in past work.

View More Papers

CDN Judo: Breaking the CDN DoS Protection with Itself

Run Guo (Tsinghua University), Weizhong Li (Tsinghua University), Baojun Liu (Tsinghua University), Shuang Hao (University of Texas at Dallas), Jia Zhang (Tsinghua University), Haixin Duan (Tsinghua University), Kaiwen Sheng (Tsinghua University), Jianjun Chen (ICSI), Ying Liu (Tsinghua University)

Read More

Snappy: Fast On-chain Payments with Practical Collaterals

Vasilios Mavroudis (University College London), Karl Wüst (ETH Zurich), Aritra Dhar (ETH Zurich), Kari Kostiainen (ETH Zurich), Srdjan Capkun (ETH Zurich)

Read More

HYPER-CUBE: High-Dimensional Hypervisor Fuzzing

Sergej Schumilo (Ruhr-Universität Bochum), Cornelius Aschermann (Ruhr-Universität Bochum), Ali Abbasi (Ruhr-Universität Bochum), Simon Wörner (Ruhr-Universität Bochum), Thorsten Holz (Ruhr-Universität Bochum)

Read More

Compliance Cautions: Investigating Security Issues Associated with U.S. Digital-Security...

Rock Stevens (University of Maryland), Josiah Dykstra (Independent Security Researcher), Wendy Knox Everette (Leviathan Security Group), James Chapman (Independent Security Researcher), Garrett Bladow (Dragos), Alexander Farmer (Independent Security Researcher), Kevin Halliday (University of Maryland), Michelle L. Mazurek (University of Maryland)

Read More