-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Profiling] Can we reduce server-side processing time for flamegraphs? #176208
Comments
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
EDIT: for more up2date feedback, see #176208 (comment) this is some initial feedback, purely based on a code audit and some offline discussion with @danielmitterdorfer The flame-graph server side code in kibana uses the ES-client. Lines 126 to 149 in 72a377d
It does not specify the I am pretty sure, that when this is not set to When setting it to To do this, you must ensure the To see an example from the current Kibana code-base, please check tile-routes: set
|
asStream: true, |
(optionally, you can gzip here)
pass on the raw response through the kibana-server response object. Ensure content-* headers match.
kibana/x-pack/plugins/maps/server/mvt/mvt_routes.ts
Lines 258 to 272 in b22dd68
if (tileStream) { | |
// use the content-encoding and content-length headers from elasticsearch if they exist | |
const { 'content-length': contentLength, 'content-encoding': contentEncoding } = headers; | |
return response.ok({ | |
body: tileStream, | |
headers: { | |
'content-disposition': 'inline', | |
...(contentLength && { 'content-length': contentLength }), | |
...(contentEncoding && { 'content-encoding': contentEncoding }), | |
'Content-Type': 'application/x-protobuf', | |
'Cache-Control': cacheControl, | |
'Last-Modified': lastModified, | |
}, | |
}); | |
} else { |
note: tileStream
is the raw ES-response body
return { stream: tile.body as Stream, headers: tile.headers, statusCode: tile.statusCode }; |
So, again, just from code-audit, I think we should make two changes:
use asStream
in the ES-client here
Lines 146 to 147 in 72a377d
signal: controller.signal, | |
meta: true, |
pass the raw resp.body
, but match the content-* headers
(this would be here (????))
return response.ok({ body }); |
Following up on earlier comment. Problem: the Elasticsearch client re-encodes the response.The ES-client parses the response and re-encodes it. See here: https://github.com/elastic/elastic-transport-js/blob/main/src/Transport.ts#L525 Resolution: use
|
signal: controller.signal, | |
meta: true, | |
} | |
); |
In order for this to work, the ES-response handling needs to change (see below)
is totalSeconds
really needed?
Line 75 in 72a377d
return { ...flamegraph, TotalSeconds: totalSeconds }; |
This adds a new property. totalSeconds
, wrapping it in a new response object. This would prevent streaming back the result.
However, it is only used for tooltips, and it seems this value is optional (?)
Line 85 in b3064ba
const comparisonImpactEstimates = |
Ensure response-object has correct content-headers
Ensure content-*
encodings are transferred from the ES-response headers
.
kibana/x-pack/plugins/observability_solution/profiling/server/routes/flamechart.ts
Line 66 in 72a377d
return response.ok({ body: flamegraph }); |
Adjust client in browser.
The client now expects a simple JSON.
return createFlameGraph(baseFlamegraph, showErrorFrames); |
This will need to be adjusted to handle a stream.
^ this is a broad investigation. I may have missed some intricacies, so it would be good to get some additional feedback from Obs-team on this as well.
Thanks for the investigation @thomasneirynck and @danielmitterdorfer!
The ES response content is gzip compressed while the Kibana response is (From what I remember, in my hotel room with a slow network I had to wait ~50s for a gzip-compressed flamegraph and only ~25s when brotli-compressed. But fya, the compression rate also depends on the payload and the response content format has changed since then.) Another caveat with simply transferring the headers is that you can not assume that the browser (or client) negotiate exactly the same content-* fields as Kibana server and ES negotiate. So I would vote for re-compression in case that content-encoding isn't the same on both sides. This is a simple decompression, compression without parsing and should be fast enough. A possible future follow-up could be to allow brotli and/or zstd on the ES side (zstd has been included in ES recently, so that's just a matter of time/effort until we can use it). |
The totalSeconds value is derived from user settings (UI) only. It is |
Hi all, Thanks for raising this. Is anyone able to summarise the kind of speeds we're seeing and the benefit we could gain? Trying to get a sense of impact on customers |
Hey @roshan-elastic! Sure, Kibana adds around 2 seconds of overhead. So in the scenario that I've tested we would reduce the response time from roughly 6 to roughly 4 seconds. |
Thanks @danielmitterdorfer |
FYI I've added this as a top-20 issue in the backlog |
Kibana version: 8.12.0
Elasticsearch version:8.12.0
Server OS version: Linux (ESS)
Browser version: N/A (issue is browser-independent)
Browser OS version: N/A (issue is browser-independent)
Original install method (e.g. download page, yum, from source, etc.): ESS
Describe the feature:
When we render a flamegraph in Universal Profiling, an Elasticsearch API sends the response already in a format that is suitable for rendering directly in the browser. Looking at APM we can see that we spend some time still in Kibana server when it basically "only" needs to pass the response from Elasticsearch to the browser as is. In the example APM trace below that time is around 2 seconds (the white gap after
POST /_profiling/flamegraph
ends):If we profile what Kibana server is doing, it seems that it is deserializing the response from Elasticsearch and then serializes it again. However, I believe we can eliminate that step because the response should already be in the exact format that the browser expects (if not I propose we adapt the ES API). Below is an annotated flamegraph that shows where Kibana server spends time serializing and deserializing the response from Elasticsearch:
The text was updated successfully, but these errors were encountered: