Hanke Kimm (Stony Brook University, NY, USA), Sagar Mishra (Stony Brook University, NY, USA), R. Sekar (Stony Brook University, NY, USA)
Advanced Persistent Threats (APTs) are long-running stealthy attack campaigns that routinely evade modern defense mechanisms. As a result, defenders rely heavily on post-compromise forensic analysis to understand attack progression and scope. System call provenance derived from audit logs is widely regarded as the most comprehensive source for reconstructing host activity and “connecting the dots” of multi-stage attacks. However, in practice, audit-derived provenance can be incomplete, or may contain errors. This can happen due to missing context, gaps in event capture or reporting, or incorrect dependency tracking. The resulting errors undermine the central goal of audit data collection, namely, serving as the complete and authoritative source for attack investigation. In this paper, we demonstrate such problems across multiple DARPA Transparent Computing datasets. We discuss the common underlying reasons that contribute to these errors, and then present a new system Improv to address them. Improv post-processes audit data at the user-level to add additional context and provenance information by querying the OS. It builds on eAudit [1], adding a modest 2.4% additional overhead.