-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(tracing): make spans resilient to performance clock drift #3384
Conversation
Co-authored-by: Marc Pichler <[email protected]>
Co-authored-by: Gerhard Stöbich <[email protected]>
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #3384 +/- ##
==========================================
- Coverage 93.27% 92.57% -0.71%
==========================================
Files 247 132 -115
Lines 7348 3152 -4196
Branches 1512 656 -856
==========================================
- Hits 6854 2918 -3936
+ Misses 494 234 -260
|
Made a sandbox with this patch, most frequently used otel web instrumentations (document-load, user-interaction, xhr and fetch) and an artificial time drift simulator: https://codesandbox.io/s/modest-fermat-4eu09e?file=/src/otel-fixes.ts It doesn't improve this test case a lot (user interaction's in right time scale, as it's auto + auto), pretty much all cases of performance timestamps are still affected by drift
|
These instrumentations all use auto start and manual end times.
If the duration is negative it should be getting the drift offset correction applied, not just set to 0. If you look at the diagram in the linked comment there is no situation where it should be set to 0, so there must be some bad code path. |
Updated codesandbox with pr changes as of now
Yep fixed on auto start + perf end, however span events (input: perf timestamp) also need correcting (now they end up outside the span w/ drift)
document-load spans don't, as they use manual (performance API) times for both start and end, and these still get drifted: |
I'm not sure what we can do there. If the user provides manual start and end times we can't correct drift because they may be intentionally describing a span that happened in the past. |
There is currently an assumption that any numeric value lower than timeOrigin is a performance timestamp:
So it's actually impossible to create spans in the past (w/ accounting for drift) anyway: |
We can document how we create timestamps to allow users to keep in sync. Maybe provide some helper APIs. |
Hmm so there is. That is disappointing and actually strikes me as a bug. The time code in core attempts to do way too much magic IMO and really limits us. That specific one can probably be fixed by changing it so that the heuristic is
I agree here documentation is a bare minimum and providing a helper API of some kind to correct drift would be a good idea |
@@ -148,21 +170,19 @@ export class Span implements api.Span, ReadableSpan { | |||
api.diag.warn('Dropping extra events.'); | |||
this.events.shift(); | |||
} | |||
if (isTimeInput(attributesOrStartTime)) { | |||
if (typeof startTime === 'undefined') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undefined was actually impossible here
const now = Date.now(); | ||
this._performanceStartTime = otperformance.now(); | ||
this._performanceOffset = now - (this._performanceStartTime + otperformance.timeOrigin); | ||
|
||
// if startTime is a number smaller than the start of the process | ||
// assume it is a performance API timestamp and apply correction as needed | ||
if (typeof startTime === 'number' && startTime < otperformance.timeOrigin) { | ||
startTime += this._performanceOffset; | ||
} | ||
|
||
this.startTime = timeInputToHrTime(startTime ?? now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @t2t2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated codesandbox with these changes, looks like all are working properly now 👍
Oh it might be better to replace both references to otperformance.timeOrigin
with getTimeOrigin from otel/core because it has additional backwards compat checks for older browsers (IE, safari <15)
function getTimeOrigin(): number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. also had to export that function
|
||
// if startTime is a number smaller than the start of the process | ||
// assume it is a performance API timestamp and apply correction as needed | ||
if (typeof startTime === 'number' && startTime < getTimeOrigin()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if startTime
is an HrTime? IIUC it should be compared to the time origin too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only 2 ways I can think of for the user to provide a HrTime
- They use our
core.hrTime()
function to get a time - They use
process.hrTime()
in node
I think we should discourage (1) and maybe even mark it deprecated. We know it is going to be shifted in the past in many cases. (2) seems extremely unlikely.
The reason I push back is because I'm actually currently reworking the startTime to check for < performance.now()
instead of < getTimeOrigin()
. We know all performance entries should be in the past, and it is a much more restrictive test. It would also allow creation of spans in the past if the user provides a timestamp from the past (as long as it isn't too close to jan 1 1970).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL here #3279 (comment)
if (provided != null) { | ||
const duration = hrTimeDuration(this.startTime, provided); | ||
if (duration[0] < 0) { | ||
return addHrTimes(provided, [0, this._performanceOffset * 1000000]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return addHrTimes(provided, [0, this._performanceOffset * 1000000]); | |
return addHrTimes(provided, [0, this._performanceOffset * 1e6]); |
or
return addHrTimes(provided, [0, this._performanceOffset * 1000000]); | |
return addHrTimes(provided, [0, this._performanceOffset * MILLISECOND_TO_NANOSECONDS]); |
} | ||
|
||
const msDuration = otperformance.now() - this._performanceStartTime; | ||
return addHrTimes(this.startTime, [0, msDuration * 1000000]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return addHrTimes(this.startTime, [0, msDuration * 1000000]); | |
return addHrTimes(this.startTime, [0, msDuration * 1e6]); |
Or
return addHrTimes(this.startTime, [0, msDuration * 1000000]); | |
return addHrTimes(this.startTime, [0, msDuration * MILLISECOND_TO_NANOSECONDS]); |
this.name = spanName; | ||
this._spanContext = spanContext; | ||
this.parentSpanId = parentSpanId; | ||
this.kind = kind; | ||
this.links = links; | ||
this.startTime = timeInputToHrTime(startTime ?? clock.now()); | ||
|
||
const now = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AnchoredClock
can still be applied here to get a high-resolution start time by default.
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
Superseded by #3434. |
Relates to #3279 particularly #3279 (comment)
Fixes #3355