-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch Instrumentation Start and End Times Can Be Off By Up to a Few Milliseconds #3719
Comments
I'm mostly unsure if this is an appropriate change (or like there's some reason Date.now() is being used in the fetch instrumentation) If not just lmk and I can see about trying to make a PR for this |
Actually that may not be the only thing involved, as I see the end of the span is actually off by several milliseconds |
For the end time, I wonder if https://github.com/open-telemetry/opentelemetry-js/blob/main/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts#L348 is involved, as it seems to imply that reader.read() returns a Promise, which would mean the resolution of the monkeypatched fetch promise would happen first (https://github.com/open-telemetry/opentelemetry-js/blob/main/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts#L367) and then it would come back and run endSpanOnSuccess after finishing reading the body, which eventually sets the end time on the span |
Maybe it has something to do with the autoinstrumentation library using its own Tracer opentelemetry-js/experimental/packages/opentelemetry-instrumentation/src/instrumentation.ts Line 57 in 369b07e
Though I don't see how that would affect things this much |
I'm still not quite sure how to investigate this further, but I'm wondering now if the auto-instrumentation's tracer is somehow shifted so all it's times are a few milliseconds off. I'm not seeing evidence of Deno's performance.timeOrigin drifting or anything though |
I'm not sure why this would matter, but putting the |
I haven't read all the messages here but I guess #3434 (and linked issues) might be of interest for you. With that PR a move from hrtime to Date.now() was done because to solve problems with hrtime drifting in some environments. |
This is most likely it, in your example code you're not actually doing anything with the response, which basically means js event loop ends up being like:
did a small change to your example: const handler = async (request: Request): Promise<Response> => {
// This call will be autoinstrumented
const otherSpan = tracer.startSpan('manualFetch');
const res = await fetch("http://www.example.com");
+ await res.text(); // Also wait for response body to be piped through
otherSpan.end();
const span = tracer.startSpan(`constructBody`);
const body = `Your user-agent is:\n\n${request.headers.get("user-agent") ?? "Unknown"}`;
span.end();
return new Response(body, { status: 200 });
}; And I got timestamps that make more sense (manualFetch duration: 178600, HTTP GET duration: 173000) Could argue that recording span end time in onSuccess would be better |
Yup I'm seeing the traces in Honeycomb look more as expected with your edit. That plus the registerInstrumentations positioning plus the use of Date.now instead of performance.now I think explains everything I've come across. If there's anything else I can do to help with this (e.g. trying to PR the end time change) just let me know. And thank you to everyone who's looked at this!! |
FYI I still don't have a guess for why the start of the fetch autoinstrumented span is so many milliseconds ahead of the start of the parent span. The call to startSpan seems to happen pretty quickly after the promise is created, and startSpan didn't seem to be doing too much work on the face. |
Posted in the linked website doc issue, but cross-posting here for visibility: @Grunet I suspect what you're experiencing is latency added by the proxy tracer provider in the API. If you acquire a tracer before registering the TracerProvider you receive a ProxyTracer object which no-ops by default. If you then register a real TracerProvider, the ProxyTracer will proxy calls to the real Tracer returned by the TracerProvider. The advantage of this is that any instrumentation which acquires a Tracer before your SDK is loaded will become active after the SDK is loaded. The disadvantage is that these early-loaded tracers will have a very slight performance penalty due to the proxy function overhead. Registering the provider before registering instrumentations avoids this overhead because the instrumentations get real tracers and you avoid the proxy. |
@t2t2 do you mean ending the span before reading the body? Wouldn't this cause the span to always end too early? I'm not sure I understand what you're arguing for here.
@Grunet absolutely if you think something can be improved here then a PR would be welcome. |
Nah, just recording the timestamp, like is already done to not include ResourceObserver timeout. But I think I'd actually retract that opinion after re-reading fetch documentation:
So changing it to when fetch promise resolves would record span end on headers start, not response end, which I think would be wrong. So I think this issue is more of a user misunderstanding - the parent span did technically end before the response of child request ended (even if it's only due to how js is a single threaded event loop) |
Ok glad we're of the same mind here |
Yup I think that makes sense to me now with all of the context. Still a touch counterintuitive personally, but all the context fixes that. I think all of my original concerns were addressed so I'll go ahead and close this issue. Thank you everyone for the help! |
What happened?
Steps to Reproduce
I tried using the FetchInstrumentation in Deno, but found that the fetch call I was making was lasting longer than it parent span.
deno run -A index.ts
curl http://localhost:8080/
Expected Result
The span created by fetch autoinstrumentation should end before its parent span ends
Actual Result
Here is what this looks like in Honeycomb today. I added in a manual span called "manualFetch" to compare to the autogenerated "HTTP GET" span to see the difference between the two
Additional Details
There are 2 places in code I think performance.now() could be used instead of Date.now() to fix this
OpenTelemetry Setup Code
package.json
Relevant log output
The text was updated successfully, but these errors were encountered: