Profiling Python
Speeding up a Python application.
Last Significant Update: |
2024-06-23 |
Status: |
Draft |
Comments to: |
Saraswati is a natural language researcher specializing in the ancient languages and scripts South-East Asia.
Saraswati needs to process sizable natural language corpora for her work. Saraswati uses Python for her work. She often wishes that the Python scripts that she uses would run faster on her laptop.
One day, midway through another hour-long analysis run, Saraswati decides to use Mensurae to profile her Python scripts to find out where her machine’s cycles were being spent.
Saraswati instructs Mensurae’s profiler to capture, for 60 seconds, the cache misses being incurred by her currently running analysis program.
// Writes to 'mensurae.out' by default.
$ measurements profile -t 60s -e cache-misses -p PID-OR-NAME
Then she invokes a web-based front-end to explore the data captured:
// Reads from 'mensurae.out' by default.
$ measurements web
https://angsa:9000/
Pointing her browser at the specified URL shows her the source for her scripts, annotated with the data captured during the Mensurae measurement run.
#count source-line
20.4M if isinstance(ex, AllExpression):
10.2M cjts = [
167.6M ex.term.replace(ex.variable, VariableExpression(d))
for d in doms
]
102.3M cjts = [self.replace_quants(c, dom) for c in cjts]
41.0M return reduce(lambda x, y: x & y, cjts)
This view is also a code browser — Saraswati can click on the function and methods in view to jump to their (annotated) definitions.
She can also determine which Python classes incurred the most the cache misses during the minute of data capture, and the methods in those classes that were the most expensive.
Using these insights Saraswati is able to dramatically speed up her analysis runs, leaving her more time for the rest of her research.
Design Considerations
Attaching To Live Processes
Mensurae should be able to profile applications without needing them to be restarted or recompiled.[1]
Light-Weight Self-Measurements For Interpreters
Mensurae’s measurement technique relies on machine code (in this case
the machine code for the Python interpreter) being able to measure its
own behavior cheaply on hardware using instructions like the x86
RDPMC
(Read Performance Monitor Counter) instruction.[1]
The Python interpreter itself would need to be augmented to use these instructions to measure its own behavior on hardware. Such augmentation could either be built-in, or could be injected into its process at runtime.
Profiling Mixed Code
Some Python classes and functions are actually implemented using native code.
Mensurae should hence handle both native and interpreted code when profiling Python.
Data Format
The format used by Mensurae for its measurement data needs to be well documented, to allow the development of third-party analysis tools.