Handling Time
Last Significant Update: |
2024-06-25 |
Status: |
Draft |
Comments to: |
The Issue
There’s no guarantee that the notion of the “current time” will be uniform on the machines in a multi-machine deployment — typically each machine’s notion of the “current time” would differ slightly from that of the others.
On some multi-processor system designs the in-CPU timestamp counters could differ in their value at any instant.
These differences cause problems when ordering events from different event streams in time order.
Possible Fixes
There are a couple of ways that we could cope with this:
-
Use application-specific knowledge to compensate for this wobble.
For example, the response to an RPC with a specific tag can only happen after the RPC was issued. Such application-specific knowledge could be used to adjust for skewed timestamps in traces.
-
Implement a periodic request/response round-trip sharing timestamps between the system being measured and the system doing the measurement.
These timestamps could be used to drive a phase-locked loop that tracks the timestamp skew for each measurement source.
-
Study how monitoring frameworks (like Open Telemetry) cope with this issue.