Last Significant Update:

2024-06-25

Status:

Draft

Comments to:

mensurae@jkoshy.net

The Issue

There’s no guarantee that the notion of the “current time” will be uniform on the machines in a multi-machine deployment — typically each machine’s notion of the “current time” would differ slightly from that of the others.

On some multi-processor system designs the in-CPU timestamp counters could differ in their value at any instant.

These differences cause problems when ordering events from different event streams in time order.

Possible Fixes

There are a couple of ways that we could cope with this:

  1. Use application-specific knowledge to compensate for this wobble.

    For example, the response to an RPC with a specific tag can only happen after the RPC was issued. Such application-specific knowledge could be used to adjust for skewed timestamps in traces.

  2. Implement a periodic request/response round-trip sharing timestamps between the system being measured and the system doing the measurement.

    These timestamps could be used to drive a phase-locked loop that tracks the timestamp skew for each measurement source.

TODO
  • Study how monitoring frameworks (like Open Telemetry) cope with this issue.