Off CPU Analysis

From http://www.brendangregg.com/offcpuanalysis.html

Measuring low resolution time Off-CPU analysis is a performance methodology where high resolution off-CPU time is measured and studied. This reveals which code-paths led to time spent waiting (blocked), and can be a quick and generic way to root-cause a wide range of performance issues.

Studying off-CPU time differs from traditional profiling, which often samples the activity of threads at a given interval, and (usually) only examine threads if they are executing work on-CPU. Here, the target are threads that, while processing a workload request, have blocked and are context-switched off-CPU. This method also differs from tracing techniques that instrument various applications functions that commonly block, since this method targets kernel functions that perform the blocking, and so doesn't rely on the foresight to instrument all the right application places.

Threads can leave CPU for a number of reasons, including waiting for file system or network I/O, acquiring a synchronization lock, paging/swapping (virtual memory), an explicit timer and signals. They can also leave CPU for some reasons somewhat unrelated to the current thread's execution, including involuntary context switching due to high demand for CPU resources, and interrupts. Whatever the reason, if this occurs during a workload request (a synchronous code-path), then it is introducing latency.

I'll summarize profiling techniques for off-CPU analysis, and introduce off-CPU as a metric. I'll then use DTrace to measure it, with MySQL as an example target to study. This methodology is suitable for any profiler that can trace kernel events, including perf, SystemTap, and ktap on Linux.