Profiling everything at one shot.

Last Significant Update:

2024-07-05

Status:

Draft

Comments to:

mensurae@jkoshy.net

Sauron has just obtained up a new Palantir model and is currently playing with it. He is very happy with it, but then notices that NetBSD™ running on the Palantir[1] is somehow immune to his all-seeing eye.

Sauron fires up Mensurae as a workaround:

// Count instructions for all threads and processes.
$ measurements count -t 1s --event instructions-retired --all

The resulting output is a listing, once a second, of every active thread in the system (daemon or otherwise), withthe number instructions executed by that thread.[2]

Sauron then asks for the instruction counts at every thread context switch.[3]

// Log instructions at every context switch to 'mensurae.out'.
$ measurements count --csw --event instructions-retired

He then visualizes the resulting data on a timeline.

But Sauron also wants a “code-centric view” of the activity on his system.  For this he uses Mensurae's 'sampling' feature to capture a callstack on every CPU in the Palantir once every 1 million instructions:

// Writes to 'mensurae.out'.
$ measurements callstacks --event instructions-retired -c 1M -t 1m

Then Sauron invokes a web based viewer on the collected profile information.

// Reads from 'mensurae.out'.
$ measurements web
http://mordor:9000/

This viewer shows him the list of the most active program images and shared libraries on his system.

Where symbol and debug information is present, the viewer shows Sauron the names of the functions that were the most active during sampling, along with their corresponding filenames and line numbers.  Where source code to these functions is available, the viewer shows him a combined source-cum-disassembly view.

Sauron notes that he is also able to view the collected data in other ways (e.g., as flamegraphs).

Sauron is pleased to discover that Mensurae’s query language can be used to isolate specific callstacks for further study.  Sauron is also pleased to note that Mensurae’s callstack views traverse native execution and interpreted language execution seamlessly.

Design Considerations

Measuring Running Processes

Mensurae should be able to attach to already running processes without disrupting their operation.  Both current and future threads of execution should be covered.

Interrupt-handler threads should similarly be instrumentable (on OSes that use these).

Stability of Operation

Mensurae’s operation should not cause the system to become unstable.

In the event of a resource shortage, the measurement process should degrade gracefully.

Low Overheads

Measurements should be low in overhead.

Cross-Language Backtraces

(Note: This feature was not implemented by FreeBSD’s PmcTools.)

Many applications use a mix of native code and interpreted languages (e.g., applications built with NodeJS, applications embedding Python, etc.), with execution crossing two or more language runtimes. For example, a callstack from an Android™ device may interleave Java, native (C/C++) code and Javascript (embedded V8) execution.

We would like backtraces to traverse diverse language runtimes seamlessly.


1. NetBSD™ runs on all sorts of hardware apparently.
2. Sauron could have performed an equivalent measurement with FreeBSD’s HWPMC too.
3. See PMC_F_LOG_PROCCSW in FreeBSD's HWPMC.