Tracking down a performance regression in a video player

Comparing current and historical measurements.

Last Significant Update:

2024-06-24

Status:

Draft

Comments to:

A Stuttering Video Player

植芝さん (Ueshiba-san) is concerned that end-users have been reporting poor quality video playback in a recent software release of his video player.

Ueshiba-さ‍ん wants to understand when and why his player drops frames as he prefers that his software run harmoniously.

Luckily, with Mensurae, Ueshiba-さ‍ん has the ability to capture a trace of system behavior:

% measurements capture-trace -s $TRACE_SPEC -o $TRACE_FILE -- \
  video-player -f test-video.mpg

He also has traces captured when prior releases of his video player rendered the same test video.

Ueshiba-さん can now (conceptually) ‘diff’ his old and new traces to look for significant differences in behavior.

% measurements diff-trace -s $DIFF_SPEC $OLD_TRACE $NEW_TRACE

A trace file would contain a lot of information: hardware counter measurements, thread context switch information, application defined events, etc., all interleaved in some arbitrary fashion.

Ueshiba-さん starts by crafting queries to test his hypotheses about the causes of the performance regression:

Were the total number of instructions executed to render the video different between the releases? Was the cache behavior, as measured by a count of cache misses, any different?
Which parts of the code were driving poor memory behavior?
For how long did a specific, latency-sensitive, thread in his application end up waiting for locks?
How often did the kernel schedule that latency-sensitive thread on a lower performance but power-efficient CPU?

…etc.

Using queries like these Ueshiba-さん was able to track down two significant changes:

A change in the size of a data structure in libmalloc was causing more cache traffic.
But more importantly, an architectural change within his application had increased the number of concurrently runnable threads. This in turn was causing the kernel scheduler to schedule work on the slower CPUs in his system (his system used a mix of ‘high-performance’ and ‘power-efficient’ CPUs); and occasionally latency-critical code would execute on a slow CPU.

Design Considerations

Analysis of Released Software

Ueshiba-さん needs to measure the behavior of ‘as-released’ software without needing to use special ‘debug’ builds. This implies that Mensurae's tools need to be able to work on released code.

Data Formats

For capturing information from long measurement runs, Ueshiba-さん would need:

Space-efficient data formats, for storing large traces.
Backward-compatibility, allowing later versions of his tools to process prior traces.
Forward-compatibility where possible, to allow traces from more recent collection software to be processed gracefully by older software.

Hardware Support

Ueshiba-さん needs to run measurements on the specific hardware that his customers are using. This means that Mensurae needs to be available on the devices that are of Ueshiba-さ‍ん’s interest.

Query Language

Ueshiba-さん wants to craft his queries using a concise notation.

He finds SQL-based approaches (like that used by Perfetto) to be unsuitable for data that essentially has graph structure. He prefers to write scripts in an graph query framework (like Apache Gremlin) that is “embedded” into his favorite programming language.