Friday, 27 February

  • 08:45 - 09:00
    Welcome and Opening Remarks
    Embarcadero
  • 09:00 - 10:00
    Keynote #1: Advancing from “What the fuzz?” to “All the Fuzz!” – Mathias Payer (EPFL)
    Embarcadero
  • 10:00 - 10:30
    Morning Break
    Pacific Ballroom D
  • 10:30 - 12:00
    Session 1
    Embarcadero
    • Kabilan Mahathevan (Department of Computer Science, Virginia Tech, Blacksburg), Yining Zhang (Department of Computer Science, Virginia Tech, Blacksburg), Muhammad Ali Gulzar (Department of Computer Science, Virginia Tech, Blacksburg), Kirshanthan Sundararajah (Department of Computer Science, Virginia Tech, Blacksburg)

      Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular control flow for various compressed storage formats directly from high-level declarative specifications, thereby making them highly susceptible to subtle correctness defects. Existing testing frameworks, which rely on mutating computation graphs restricted to a standard vocabulary of operators, fail to exercise the arbitrary loop synthesis capabilities of these compilers. Furthermore, generic grammar-based fuzzers struggle to generate valid inputs due to the strict rules governing how indices are reused across multiple tensors.

      In this paper, we present TENSURE, the first extensible blackbox fuzzing framework specifically designed for the testing of STCs. TENSURE leverages Einstein Summation (Einsum) notation as a general input abstraction, enabling the generation of complex, unconventional tensor contractions that expose corner cases in the code-generation phases of STCs. We propose a novel constraint-based generation algorithm that guarantees 100% semantic validity of synthesized kernels, significantly outperforming the ∼3.3% validity rate of baseline grammar fuzzers. To enable metamorphic testing without a trusted reference, we introduce a set of semantic-preserving mutation operators that exploit algebraic commutativity and heterogeneity in storage formats. Our evaluation on two state-of-the-art systems, TACO and Finch, reveals widespread fragility, particularly in TACO, where TENSURE exposed crashes or silent miscompilations in a majority of generated test cases. These findings underscore the critical need for specialized testing tools in the sparse compilation ecosystem.

    • Kai Feng (School of Computing Science, University of Glasgow), Jeremy Singer (School of Computing Science, University of Glasgow), Angelos K Marnerides (Dept. of Electrical & Computer Engineering, KIOS CoE, University of Cyprus)

      Fuzzing firmware on microcontrollers (MCUs) is hard to scale. Rehosting is an ideal way to achieve this, but it often loses fidelity and can be slow, while on-device tracing support is limited. Standard coverage-guided fuzzing relies on software instrumentation, which is costly for MCUs and gives only control-flow signals that arrive late for complex checks.

      We present Hardfuzz, an on-device fuzzer that uses definition-use (def-use) chains to guide exploration. Hardfuzz performs offline static analysis to extract def-use pairs from the binary, then runs directly on the device and uses the debug unit’s hardware breakpoints to observe when definitions and their uses execute. Two small bitmaps in shared memory record (i) which definitions execute and (ii) which def-use pairs execute, giving rich feedback than basic-block coverage alone. A lightweight scheduler prioritises definitions with many uses and adapts to the few hardware breakpoints available on MCUs.

      We evaluate Hardfuzz against another hardware breakpoint-based solution, GDBFuzz. In emulation, Hardfuzz achieves higher basic-block coverage in most targets and progresses faster in the early hours running on emulation. On hardware, it covers 14-40% more basic blocks after 24 hours across three programs with known faults. These results show that def-use guidance is practical on MCUs and improves exploration over control-flow-only feedback.

    • Tobias Holl (Ruhr University Bochum), Leon Weiß (Ruhr University Bochum), Kevin Borgolte (Ruhr University Bochum)

      Fuzzing is one of the most successful techniques to test software and discover vulnerabilities. Due to its effectiveness and ease of scaling, it is often done in parallel on hundreds to thousands of CPU cores and improving fuzzers’ efficiency, efficacy, and performance has become a major research area. Typically focused on enhancing fuzzing itself, such as through better input generation or optimizing instrumentation to execute the program more frequently, the goal is to find more bugs or flaws in less time. On the other hand, optimizing the performance of the target program, which the fuzzer executes billions of times, specifically for fuzzing has received little attention.

      We introduce Pogofuzz, a novel approach to improving fuzzing performance that is fuzzer-agnostic and target-agnostic. We leverage the insight that the inputs used for future mutations are known, to then use compiler-based profile-guided optimization (PGO) to optimize the target program specifically for these future inputs. By regularly creating new profiles based on the next inputs, recompiling the target program with new optimizations, and in-situ replacing the target in the fuzzing process with its newly optimized version, Pogofuzz improves fuzzing performance of the state-of-the-art fuzzer AFL++.

      We provide preliminary results for Pogofuzz in different realistic experimental setups, comparing it to AFL++ on four software projects from the FuzzBench suite for 1–6 physical CPU cores per fuzzer, to demonstrate Pogofuzz’s advantages. Our preliminary results show that our approach has the potential to improve fuzzing throughput, despite incurring additional optimization and recompilation costs. Pogofuzz, as a fuzzer-target-agnostic approach, is a significant departure from traditional improvements in fuzzing, which are fuzzer-specific and/or target-specific, providing the opportunity for new, general performance improvements for large-scale, extended fuzzing.

      To encourage adoption and reproducibility of our research, we will make Pogofuzz publicly available as open source before or with the publication of the extended paper.

    • Tobias Holl (Ruhr University Bochum), Leon Weiß (Ruhr University Bochum), Kevin Borgolte (Ruhr University Bochum)

      Current best practice recommendations for fuzzing research rely on the use of standardized evaluation frameworks. While many aspects of these frameworks have been scrutinized, the coverage collection and evaluation remain blind spots. We present two examples of discrepancies in coverage reported by FuzzBench, highlighting a gap in our understanding of the evaluation frameworks we depend on. In this work, we close this gap for FuzzBench by systematically examining its coverage analysis. We find that there are issues in every stage of the analysis pipeline that can potentially influence reported coverage.

      We propose a thorough experimental design to assess the impact of each individual flaw, and their combined influence on the evaluation results. We will contextualize our results within the existing literature and discuss implications for our trust in previous fuzzing evaluations. With this work, we will improve confidence in the reliabilty of standardized fuzzing benchmarks and inform future research on how to improve their evaluation.

      We will open-source our code after completing all experiments.

    • Tobias Wienand (Ruhr-Universitat Bochum), Lukas Bernhard (Ruhr-Universitat Bochum), Flavio Toffalini (Ruhr-Universitat Bochum)

      JavaScript (JS) engines apply heavy code optimizations to the executed JS code through Just-in-Time (JIT) compilation. Incorrectly handling JS types during JIT compilation can lead to exploitable bugs in the engine. Current fuzzing techniques for JS engines rely solely on code coverage as the dominant feedback mechanism. However, code coverage primarily captures control-flow diversity rather than data-flow diversity. This limitation is crucial for JS engines, where runtime type information drives JIT compiler optimization decisions.

      In this work, we investigate whether type coverage can improve bug-finding effectiveness over traditional code coverage in JS engines. Our prototype, TYPEFUZZ, tracks heap object types at optimization-sensitive locations during JIT compilation and directs fuzzing exploration toward under-tested type locations. We have implemented TYPEFUZZ on top of Fuzzilli and instrumented V8’s Maglev and Turbofan compilers to track 463 typesensitive locations. Our preliminary evaluation demonstrates that type coverage successfully increases data-flow diversity during JIT compilation by 37.5% compared to code coverage alone, effectively exploring substantially more type-sensitive compiler states. In our preliminary campaign, we discovered four bugs in non experimental features of V8. All bugs were discoverable with both metrics in this preliminary evaluation, yet the substantial increase in type-diverse states explored suggests potential for discovering type-specific bugs with extended campaigns, enhanced bug oracles (differential testing), and cross-engine evaluation on JavaScriptCore.

    • Yingnan Zhou (Nankai University), Yuhao Liu (Nankai University), Hanfeng Zhang (Nankai University), Yan Jia (Nankai University), Sihan Xu (Nankai University), Zhiyuan Jiang (National University of Defense Technology), Zheli Liu (Nankai University)

      Flight control software for unmanned aerial vehicles (UAVs) offers numerous configuration parameters. However, their complexity raises the risk of incorrect configurations, leading to mission failures or crashes. Although fuzzing is effective for discovering software vulnerabilities, its application to UAVs configuration is hindered by the need to obtain physical states (e.g., position and altitude) from a time-consuming simulator. Furthermore, machine learning-based acceleration methods often suffer from limited generalizability due to their reliance on flight logs as training data. To address these challenges, we propose UAVConfigFuzzer, a novel fuzzing tool that accelerates configuration testing via setpoint estimation guided fuzzing. In flight control software, setpoints are the calculated target values that guide the UAV’s movement based on configurations. UAVConfigFuzzer leverages the native setpoint generation module to generate setpoints, which serve as the estimated UAV’s physical states to rapidly quantify the severity of UAV’s anomalies. Guided by this efficient and accurate feedback, UAVConfigFuzzer steers the mutation process toward anomaly-inducing configurations without relying on simulators or extensive flight logs. We evaluate UAVConfigFuzzer on PX4, a widely used open-source UAV flight control software, the results demonstrate that the feedback achieves an average runtime of 27 milliseconds. The estimated states maintain high fidelity, with a mean position error below 6.92 cm and a velocity error below 0.13 m/s. Leveraging this rapid feedback, UAVConfigFuzzer detects 14 incorrect configurations. These issues were validated on real UAV hardware and have been acknowledged by the community maintainers for remediation.

  • 12:00 - 13:30
    Lunch
    Loma Vista Terrace and Harborside
  • 13:30 - 14:30
    Keynote #2: Where the Fuzz Are We Going? – Sergej Dechand (Code Intelligence, MPI-SP)
    Embarcadero
  • 14:30 - 15:00
    Afternoon Break
    Pacific Ballroom D
  • 15:00 - 15:45
    Panel Discussion
    Embarcadero
  • 15:45 - 16:55
    Session 2
    Embarcadero
    • Zhaobin Li, Chenchong Du, Xiaoyi Duan

    • Davide Rusconi (University of Milan), Osama Yousef (University of Milan), Mirco Picca (University of Milan), Danilo Bruschi (University of Milan), Flavio Toffalini (Ruhr-Universitat Bochum),  Andrea Lanzi (University of Milan)

      In this paper, we show E-FuzzEdge, a novel fuzzing architecture targeted towards improving the throughput of fuzzing campaigns in contexts where scalability is unavailable. E-FuzzEdge addresses the inefficiencies of hardware-in-the-loop fuzzing for microcontrollers by optimizing execution speed. We evaluated our system against both real-world embedded libraries and state-of-the-art benchmarks, demonstrating significant performance improvements. A key advantage of the E-FuzzEdge architecture is its compatibility with other embedded fuzzing techniques that perform on device testing instead of firmware emulation. This means that the broader embedded fuzzing community can integrate E-FuzzEdge into their workflows to enhance overall testing efficiency.

    • Nelum Attanayake (School of Computer Science, University of Sydney), Danushka Liyanage (School of Computer Science, University of Sydney), Clement Canonne (School of Computer Science, University of Sydney), Suranga Seneviratne (School of Computer Science, University of Sydney), Rahul Gopinath (School of Computer Science, University of Sydney)

      Background: Fuzzing campaigns require accurate estimation of maximum reachable coverage to ensure that resources are not wasted. However, adaptive bias due to the use of coverage feedback in modern fuzzers prevents accurate statistical estimation of maximum reachable coverage. Recent work hypothesizes that adaptive bias is minimized when singleton species, observed exactly once, equal doubletons, observed exactly twice. Rigorous evaluation of this hypothesis has been hindered by the lack of ground truth.

      Objective: This work evaluates whether maximum reachable coverage estimates are reliable when adaptive bias is minimized, using two complementary approaches (1) to mitigate the lack of ground truth and (2) to establish ground truth.

      Methods: First, we compare maximum reachable coverage estimates between coverage-guided and purely random fuzzers on real-world benchmarks. Since random fuzzers lack coverage feedback, they exhibit no adaptive bias. If the singleton-doubleton equilibrium criterion reliably indicates minimal adaptive bias, the coverage-guided fuzzer should reach maximum reachable coverage estimates comparable to the random fuzzer at this equilibrium point. Second, we validate estimates using synthetic programs with known maximum reachable coverage, where complex control flows mimic real-world complexity while providing objective ground truth.

      Results: These complementary studies will determine whether maximum reachable coverage estimates are reliable when the singleton-doubleton equilibrium criterion is satisfied, validating or refuting its use as a stopping criterion for fuzzing campaigns.

    • Fannv He (National Computer Network Intrusion Protection Center, University of Chinese Academy of Sciences, China, and School of Cyberspace Security, Hainan University, China), Yuan Liu (School of Cyber Engineering, Xidian University, China), Jice Wang (School of Cyberspace Security, Hainan University, China), Baiquan Wang (School of Cyberspace Security, Hainan University, China), Zezhong Ren (National Computer Network Intrusion Protection Center, University of Chinese Academy of Sciences, China), Yuqing Zhang (National Computer Network Intrusion Protection Center, University of Chinese Academy of Sciences, China; School of Cyberspace Security, Hainan University, China, and School of Cyber Engineering, Xidian University, China)

      Fuzzing fundamentally relies on crash observability to guide its search. This paper breaks this premise by introducing MES, a novel anti-fuzzing system designed to make crashes unobservable. MES employs a compile-time address masking technique that instruments all memory accesses, ensuring they always refer to valid regions, thereby systematically suppressing memory-error crashes at their root. Our design stems from a validated foundational premise: invalid data accesses constitute the vast majority of crashes. Thus, a data-flow-centric suppression strategy offers the most effective defense. We evaluate MES through a three-pillar methodology: validating the premise via precise analysis of Binutils 2.13; assessing real-world efficacy against state-of-the-art fuzzers using the UNIFUZZ benchmark; and quantifying overhead/deployment scope with SPEC CPU 2017. MES is implemented as an LLVM compiler pass and a custom loader. Based on the experimental data obtained to date, MES demonstrates a strong capability to suppress memory-error crashes, with current results indicating a suppression rate exceeding 97% in our tests, which significantly impedes fuzzing progress. Preliminary performance measurements show that its overhead remains manageable within a well-defined operational envelope, supporting its promising potential as a practical defense in scenarios where crash suppression is critical. The full evaluation is ongoing to solidify these findings.

  • 16:35 - 16:40
    Concluding Remarks
    Embarcadero