[Profiling] Can we reduce server-side processing time for flamegraphs? #176208

danielmitterdorfer · 2024-02-05T09:58:01Z

Kibana version: 8.12.0

Elasticsearch version:8.12.0

Server OS version: Linux (ESS)

Browser version: N/A (issue is browser-independent)

Browser OS version: N/A (issue is browser-independent)

Original install method (e.g. download page, yum, from source, etc.): ESS

Describe the feature:

When we render a flamegraph in Universal Profiling, an Elasticsearch API sends the response already in a format that is suitable for rendering directly in the browser. Looking at APM we can see that we spend some time still in Kibana server when it basically "only" needs to pass the response from Elasticsearch to the browser as is. In the example APM trace below that time is around 2 seconds (the white gap after POST /_profiling/flamegraph ends):

If we profile what Kibana server is doing, it seems that it is deserializing the response from Elasticsearch and then serializes it again. However, I believe we can eliminate that step because the response should already be in the exact format that the browser expects (if not I propose we adapt the ES API). Below is an annotated flamegraph that shows where Kibana server spends time serializing and deserializing the response from Elasticsearch:

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-02-05T09:58:03Z

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

thomasneirynck · 2024-04-03T15:00:31Z

EDIT: for more up2date feedback, see #176208 (comment)

this is some initial feedback, purely based on a code audit and some offline discussion with @danielmitterdorfer

The flame-graph server side code in kibana uses the ES-client.

kibana/x-pack/plugins/observability_solution/profiling_data_access/server/utils/create_profiling_es_client.ts

Lines 126 to 149 in 72a377d

    
           return esClient.transport.request( 
        
             { 
        
               method: 'POST', 
        
               path: encodeURI('/_profiling/flamegraph'), 
        
               body: { 
        
                 query, 
        
                 sample_size: sampleSize, 
        
                 requested_duration: durationSeconds, 
        
                 co2_per_kwh: co2PerKWH, 
        
                 per_core_watt_x86: pervCPUWattX86, 
        
                 per_core_watt_arm64: pervCPUWattArm64, 
        
                 datacenter_pue: datacenterPUE, 
        
                 aws_cost_factor: awsCostDiscountRate, 
        
                 cost_per_core_hour: costPervCPUPerHour, 
        
                 azure_cost_factor: azureCostDiscountRate, 
        
                 indices, 
        
                 stacktrace_ids_field: stacktraceIdsField, 
        
               }, 
        
             }, 
        
             { 
        
               signal: controller.signal, 
        
               meta: true, 
        
             } 
        
           );

It does not specify the asStream option:

https://github.com/elastic/elasticsearch-js/blob/4aa00e03e1fac5385387dfa7cfd6e811d5307028/docs/examples/asStream.asciidoc

I am pretty sure, that when this is not set to true, the client will unpack the result.

When setting it to true, you can stream it back through the kibana server.

To do this, you must ensure the content-* headers match the output from Elasticsearch.

To see an example from the current Kibana code-base, please check tile-routes:

set `asStream` = true

kibana/x-pack/plugins/maps/server/mvt/mvt_routes.ts

Line 224 in b22dd68

asStream: true,

(optionally, you can gzip here)

pass on the raw response through the kibana-server response object. Ensure content-* headers match.

kibana/x-pack/plugins/maps/server/mvt/mvt_routes.ts

Lines 258 to 272 in b22dd68

    
           if (tileStream) { 
        
             // use the content-encoding and content-length headers from elasticsearch if they exist 
        
             const { 'content-length': contentLength, 'content-encoding': contentEncoding } = headers; 
        
             return response.ok({ 
        
               body: tileStream, 
        
               headers: { 
        
                 'content-disposition': 'inline', 
        
                 ...(contentLength && { 'content-length': contentLength }), 
        
                 ...(contentEncoding && { 'content-encoding': contentEncoding }), 
        
                 'Content-Type': 'application/x-protobuf', 
        
                 'Cache-Control': cacheControl, 
        
                 'Last-Modified': lastModified, 
        
               }, 
        
             }); 
        
           } else {

note: tileStream is the raw ES-response body

kibana/x-pack/plugins/maps/server/mvt/mvt_routes.ts

Line 230 in b22dd68

    
           return { stream: tile.body as Stream, headers: tile.headers, statusCode: tile.statusCode };

)

So, again, just from code-audit, I think we should make two changes:

use `asStream` in the ES-client here

kibana/x-pack/plugins/observability_solution/profiling_data_access/server/utils/create_profiling_es_client.ts

Lines 146 to 147 in 72a377d

    
           signal: controller.signal, 
        
           meta: true,

pass the raw `resp.body`, but match the `content-* headers`

(this would be here (????))

kibana/packages/kbn-server-route-repository/src/register_routes.ts

Line 88 in f67afe2

return response.ok({ body });

thomasneirynck · 2024-04-03T22:10:28Z

Following up on earlier comment.

Problem: the Elasticsearch client re-encodes the response.

The ES-client parses the response and re-encodes it. See here:

https://github.com/elastic/elastic-transport-js/blob/main/src/Transport.ts#L525
https://github.com/elastic/elastic-transport-js/blob/main/src/Serializer.ts#L62

Resolution: use `asStream`

The ideal solution would be to use the asStream option to stream the result back.

Add an asStream-option here:

kibana/x-pack/plugins/observability_solution/profiling_data_access/server/utils/create_profiling_es_client.ts

Lines 146 to 149 in 72a377d

    
               signal: controller.signal, 
        
               meta: true, 
        
             } 
        
           );

In order for this to work, the ES-response handling needs to change (see below)

is `totalSeconds` really needed?

kibana/x-pack/plugins/observability_solution/profiling_data_access/server/services/fetch_flamechart/index.ts

Line 75 in 72a377d

return { ...flamegraph, TotalSeconds: totalSeconds };

This adds a new property. totalSeconds, wrapping it in a new response object. This would prevent streaming back the result.

However, it is only used for tooltips, and it seems this value is optional (?)

kibana/x-pack/plugins/observability_solution/profiling/public/components/flamegraph/flamegraph_tooltip.tsx

Line 85 in b3064ba

const comparisonImpactEstimates =

Ensure response-object has correct content-headers

Ensure content-* encodings are transferred from the ES-response headers.

kibana/x-pack/plugins/observability_solution/profiling/server/routes/flamechart.ts

Line 66 in 72a377d

return response.ok({ body: flamegraph });

Adjust client in browser.

The client now expects a simple JSON.

kibana/x-pack/plugins/observability_solution/profiling/public/services.ts

Line 114 in 72a377d

return createFlameGraph(baseFlamegraph, showErrorFrames);

This will need to be adjusted to handle a stream.

^ this is a broad investigation. I may have missed some intricacies, so it would be good to get some additional feedback from Obs-team on this as well.

rockdaboot · 2024-04-15T07:00:06Z

Thanks for the investigation @thomasneirynck and @danielmitterdorfer!

Ensure content-* encodings are transferred from the ES-response headers.

The ES response content is gzip compressed while the Kibana response is br (brotli) compressed. We did this to reduce the content size by ~50% and reducing the transfer time to the Kibana client side. See #142334

(From what I remember, in my hotel room with a slow network I had to wait ~50s for a gzip-compressed flamegraph and only ~25s when brotli-compressed. But fya, the compression rate also depends on the payload and the response content format has changed since then.)

Another caveat with simply transferring the headers is that you can not assume that the browser (or client) negotiate exactly the same content-* fields as Kibana server and ES negotiate.
Sure enough, most browsers are lenient enough to accept a gzip content-encoding even when brotli has been negotiated. But "most" is not "all" - and we possibly don't won't to run into support issues here.

So I would vote for re-compression in case that content-encoding isn't the same on both sides. This is a simple decompression, compression without parsing and should be fast enough.

A possible future follow-up could be to allow brotli and/or zstd on the ES side (zstd has been included in ES recently, so that's just a matter of time/effort until we can use it).

rockdaboot · 2024-04-15T07:02:51Z

is totalSeconds really needed?

The totalSeconds value is derived from user settings (UI) only. It is timeTo-timeFrom, both request params. So this can be calculated on the client side as well (alternatively, it can be added to the flamegraph response if that is a better option).

roshan-elastic · 2024-08-15T09:37:17Z

Hi all,

Thanks for raising this. Is anyone able to summarise the kind of speeds we're seeing and the benefit we could gain?

Trying to get a sense of impact on customers

danielmitterdorfer · 2024-08-19T06:29:15Z

Thanks for raising this. Is anyone able to summarise the kind of speeds we're seeing and the benefit we could gain?

Hey @roshan-elastic! Sure, Kibana adds around 2 seconds of overhead. So in the scenario that I've tested we would reduce the response time from roughly 6 to roughly 4 seconds.

roshan-elastic · 2024-08-19T08:49:31Z

Thanks @danielmitterdorfer

roshan-elastic · 2024-08-19T08:51:02Z

FYI I've added this as a top-20 issue in the backlog

danielmitterdorfer added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Feb 5, 2024

smith added bug Fixes for quality problems that affect the customer experience performance labels Apr 4, 2024

roshan-elastic added the info-needed label Aug 15, 2024

danielmitterdorfer removed the info-needed label Aug 19, 2024

roshan-elastic added the needs-refinement A reason and acceptance criteria need to be defined for this issue label Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Profiling] Can we reduce server-side processing time for flamegraphs? #176208

[Profiling] Can we reduce server-side processing time for flamegraphs? #176208

danielmitterdorfer commented Feb 5, 2024 •

edited

Loading

elasticmachine commented Feb 5, 2024

thomasneirynck commented Apr 3, 2024 •

edited

Loading

thomasneirynck commented Apr 3, 2024 •

edited

Loading

rockdaboot commented Apr 15, 2024

rockdaboot commented Apr 15, 2024 •

edited

Loading

roshan-elastic commented Aug 15, 2024

danielmitterdorfer commented Aug 19, 2024

roshan-elastic commented Aug 19, 2024

roshan-elastic commented Aug 19, 2024

[Profiling] Can we reduce server-side processing time for flamegraphs? #176208

[Profiling] Can we reduce server-side processing time for flamegraphs? #176208

Comments

danielmitterdorfer commented Feb 5, 2024 • edited Loading

elasticmachine commented Feb 5, 2024

thomasneirynck commented Apr 3, 2024 • edited Loading

set asStream = true

pass on the raw response through the kibana-server response object. Ensure content-* headers match.

use asStream in the ES-client here

pass the raw resp.body, but match the content-* headers

thomasneirynck commented Apr 3, 2024 • edited Loading

Problem: the Elasticsearch client re-encodes the response.

Resolution: use asStream

is totalSeconds really needed?

Ensure response-object has correct content-headers

Adjust client in browser.

rockdaboot commented Apr 15, 2024

rockdaboot commented Apr 15, 2024 • edited Loading

roshan-elastic commented Aug 15, 2024

danielmitterdorfer commented Aug 19, 2024

roshan-elastic commented Aug 19, 2024

roshan-elastic commented Aug 19, 2024

danielmitterdorfer commented Feb 5, 2024 •

edited

Loading

thomasneirynck commented Apr 3, 2024 •

edited

Loading

set `asStream` = true

use `asStream` in the ES-client here

pass the raw `resp.body`, but match the `content-* headers`

thomasneirynck commented Apr 3, 2024 •

edited

Loading

Resolution: use `asStream`

is `totalSeconds` really needed?

rockdaboot commented Apr 15, 2024 •

edited

Loading