Profiling Python

Speeding up a Python application.

Last Significant Update:

2024-06-23

Status:

Draft

Comments to:

Saraswati is a natural language researcher specializing in the ancient languages and scripts South-East Asia.

Saraswati needs to process sizable natural language corpora for her work. Saraswati uses Python for her work. She often wishes that the Python scripts that she uses would run faster on her laptop.

One day, midway through another hour-long analysis run, Saraswati decides to use Mensurae to profile her Python scripts to find out where her machine’s cycles were being spent.

Saraswati instructs Mensurae’s profiler to capture, for 60 seconds, the cache misses being incurred by her currently running analysis program.

// Writes to 'mensurae.out' by default.
$ measurements profile -t 60s -e cache-misses -p PID-OR-NAME

Then she invokes a web-based front-end to explore the data captured:

// Reads from 'mensurae.out' by default.
$ measurements web
https://angsa:9000/

Pointing her browser at the specified URL shows her the source for her scripts, annotated with the data captured during the Mensurae measurement run.

Annotated Python Source Code

#count    source-line
 20.4M    if isinstance(ex, AllExpression):
 10.2M      cjts = [
167.6M        ex.term.replace(ex.variable, VariableExpression(d))
              for d in doms
            ]
102.3M      cjts = [self.replace_quants(c, dom) for c in cjts]
 41.0M    return reduce(lambda x, y: x & y, cjts)

This view is also a code browser — Saraswati can click on the function and methods in view to jump to their (annotated) definitions.

She can also determine which Python classes incurred the most the cache misses during the minute of data capture, and the methods in those classes that were the most expensive.

Using these insights Saraswati is able to dramatically speed up her analysis runs, leaving her more time for the rest of her research.

Design Considerations

Attaching To Live Processes

Mensurae should be able to profile applications without needing them to be restarted or recompiled.^[1]

Light-Weight Self-Measurements For Interpreters

Mensurae’s measurement technique relies on machine code (in this case the machine code for the Python interpreter) being able to measure its own behavior cheaply on hardware using instructions like the x86 RDPMC (Read Performance Monitor Counter) instruction.^[1]

The Python interpreter itself would need to be augmented to use these instructions to measure its own behavior on hardware. Such augmentation could either be built-in, or could be injected into its process at runtime.

Profiling Mixed Code

Some Python classes and functions are actually implemented using native code.

Mensurae should hence handle both native and interpreted code when profiling Python.

Data Format

The format used by Mensurae for its measurement data needs to be well documented, to allow the development of third-party analysis tools.

1. Its predecessor PmcTools offered this capability.