René Helmke (Fraunhofer FKIE), Elmar Padilla (Fraunhofer FKIE, Germany), Nils Aschenbruck (University of Osnabrück)

Firmware corpora for vulnerability research should be textit{scientifically sound}. Yet, several practical challenges complicate the creation of sound corpora: Sample acquisition, e.g., is hard and one must overcome the barrier of proprietary or encrypted data. As image contents are unknown prior analysis, it is hard to select textit{high-quality} samples that can satisfy scientific demands.
Ideally, we help each other out by sharing data. But here, sharing is problematic due to copyright laws. Instead, papers must carefully document each step of corpus creation: If a step is unclear, replicability is jeopardized. This has cascading effects on result verifiability, representativeness, and, thus, soundness.

Despite all challenges, how can we maintain the soundness of firmware corpora? This paper thoroughly analyzes the problem space and investigates its impact on research: We distill practical binary analysis challenges that significantly influence corpus creation. We use these insights to derive guidelines that help researchers to nurture corpus replicability and representativeness. We apply them to 44 top tier papers and systematically analyze scientific corpus creation practices. Our comprehensive analysis confirms that there is currently no common ground in related work. It shows the added value of our guidelines, as they discover methodical issues in corpus creation and unveil miniscule step stones in documentation. These blur visions on representativeness, hinder replicability, and, thus, negatively impact the soundness of otherwise excellent work.

Finally, we show the feasibility of our guidelines and build a new corpus for large-scale analyses on Linux firmware: LFwC. We share rich meta data for good (and proven) replicability. We verify unpacking, deduplicate, identify contents, provide ground truth, and demonstrate LFwC's utility for research.

View More Papers

Siniel: Distributed Privacy-Preserving zkSNARK

Yunbo Yang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Yuejia Cheng (Shanghai DeCareer Consulting Co., Ltd), Kailun Wang (Beijing Jiaotong University), Xiaoguo Li (College of Computer Science, Chongqing University), Jianfei Sun (School of Computing and Information Systems, Singapore Management University), Jiachen Shen (Shanghai Key Laboratory of Trustworthy Computing, East China Normal…

Read More

Eclipse Attacks on Monero's Peer-to-Peer Network

Ruisheng Shi (Beijing University of Posts and Telecommunications), Zhiyuan Peng (Beijing University of Posts and Telecommunications), Lina Lan (Beijing University of Posts and Telecommunications), Yulian Ge (Beijing University of Posts and Telecommunications), Peng Liu (Penn State University), Qin Wang (CSIRO Data61), Juan Wang (Wuhan University)

Read More

BrowserFM: A Feature Model-based Approach to Browser Fingerprint Analysis

Maxime Huyghe (Univ. Lille, Inria, CNRS, UMR 9189 CRIStAL), Clément Quinton (Univ. Lille, Inria, CNRS, UMR 9189 CRIStAL), Walter Rudametkin (Univ. Rennes, Inria, CNRS, UMR 6074 IRISA)

Read More

Fuzzing Space Communication Protocols

Stephan Havermans (IMDEA Software Institute), Lars Baumgaertner, Jussi Roberts, Marcus Wallum (European Space Agency), Juan Caballero (IMDEA Software Institute)

Read More