Abstract
System call failures present significant challenges for operating system (OS) users, as the failures are often cryptic and difficult to diagnose due to limited error codes and missing documentation. As a result, software developers struggle to utilize system calls effectively, and power users encounter difficulties configuring the OS and resolving environment problems. Existing automatic root-cause analysis tools are inadequate, primarily due to dependence on comparative analysis, which requires similar successful executions that are often unavailable.
In this paper, we present a tracing and root-cause analysis solution to address these limitations. We enable comparative analysis by using symbolic execution to generate analogous successful executions in the absence of actual ones. Furthermore, to address the limited availability and shortcomings of hardware-based control-flow tracing, we propose probe-point based tracing of the entire control flow. Utilizing these techniques, we develop DeepErr, a system call analyzer that identifies the precise predicate responsible for failures. DeepErr’s effectiveness is affirmed through application on 100 tests from the Linux Test Project, successfully pinpointing root causes in 91% of the scenarios and identifying the failing function in an additional 7% of cases.
Award: This paper received the Best Paper Award at SysTor 2025.