[Profiling] Query in parallel only if beneficial #103061

danielmitterdorfer · 2023-12-06T15:00:14Z

With this commit we check index allocation before we do key-value lookups. To reduce latency, key-value lookups are done in parallel for multiple slices of data. However, on nodes with spinning disks, parallel accesses are harmful. Therefore, we check whether any index is allocated either to the warm or cold tier (which are usually on spinning disks) and disable parallel key-value lookups. This has improved latency on the warm tier by about 10% in our experiments.

elasticsearchmachine · 2023-12-06T15:00:39Z

Pinging @elastic/obs-knowledge-team (Team:obs-knowledge)

elasticsearchmachine · 2023-12-06T15:00:39Z

Hi @danielmitterdorfer, I've created a changelog YAML for you.

danielmitterdorfer · 2023-12-07T07:18:15Z

@elasticmachine merge upstream

rockdaboot · 2023-12-07T13:00:41Z

...profiling/src/main/java/org/elasticsearch/xpack/profiling/TransportGetStackTracesAction.java

        ClusterState clusterState = clusterService.state();
        List<Index> indices = resolver.resolve(clusterState, "profiling-stacktraces", responseBuilder.getStart(), responseBuilder.getEnd());
+        // Avoid parallelism if there is potential we are on spinning disks (frozen tier uses searchable snapshots)
+        int sliceCount = IndexAllocation.isAnyOnWarmOrColdTier(clusterState, indices) ? 1 : desiredSlices;


Just a question. Did you test with 2 slices on the warm tier as well? If yes, what was the ~ impact?

I just dug through my notes. I believe I did such a test but the superior alternative in all cases was just using a single slice (apparently I did not even bother to write down the results for anything except the default value of 16 and a single slice...).

rockdaboot

👍

elasticsearchmachine · 2023-12-07T15:46:42Z

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 103061

With this commit we check index allocation before we do key-value lookups. To reduce latency, key-value lookups are done in parallel for multiple slices of data. However, on nodes with spinning disks, parallel accesses are harmful. Therefore, we check whether any index is allocated either to the warm or cold tier (which are usually on spinning disks) and disable parallel key-value lookups. This has improved latency on the warm tier by about 10% in our experiments.

danielmitterdorfer · 2023-12-08T06:35:13Z

I'm manually backporting this in #103144.

With this commit we check index allocation before we do key-value lookups. To reduce latency, key-value lookups are done in parallel for multiple slices of data. However, on nodes with spinning disks, parallel accesses are harmful. Therefore, we check whether any index is allocated either to the warm or cold tier (which are usually on spinning disks) and disable parallel key-value lookups. This has improved latency on the warm tier by about 10% in our experiments. Co-authored-by: Elastic Machine <[email protected]>

In order to take advantage of inherent parallelism of modern SSDs, we slice keys and issue multiple mgets concurrently. In elastic#103061 we have introduced an additional heuristic to disable that behavior on the warm and cold tier which usually use spinning disks. We have unintentionally also disabled the behavior on content nodes, i.e. on any clusters which do not use data tiers. With this commit we explicitly exclude content nodes from the heuristic so they can benefit from speedups due to concurrent mgets. Relates elastic#103061

In order to take advantage of inherent parallelism of modern SSDs, we slice keys and issue multiple mgets concurrently. In #103061 we have introduced an additional heuristic to disable that behavior on the warm and cold tier which usually use spinning disks. We have unintentionally also disabled the behavior on content nodes, i.e. on any clusters which do not use data tiers. With this commit we explicitly exclude content nodes from the heuristic so they can benefit from speedups due to concurrent mgets. Relates #103061

In order to take advantage of inherent parallelism of modern SSDs, we slice keys and issue multiple mgets concurrently. In elastic#103061 we have introduced an additional heuristic to disable that behavior on the warm and cold tier which usually use spinning disks. We have unintentionally also disabled the behavior on content nodes, i.e. on any clusters which do not use data tiers. With this commit we explicitly exclude content nodes from the heuristic so they can benefit from speedups due to concurrent mgets. Relates elastic#103061

In order to take advantage of inherent parallelism of modern SSDs, we slice keys and issue multiple mgets concurrently. In #103061 we have introduced an additional heuristic to disable that behavior on the warm and cold tier which usually use spinning disks. We have unintentionally also disabled the behavior on content nodes, i.e. on any clusters which do not use data tiers. With this commit we explicitly exclude content nodes from the heuristic so they can benefit from speedups due to concurrent mgets. Relates #103061

danielmitterdorfer added >bug auto-backport-and-merge :UniversalProfiling/Application Elastic Universal Profiling REST APIs and infrastructure v8.12.0 labels Dec 6, 2023

elasticsearchmachine added the Team:obs-knowledge Meta label for Observability Knowledge team label Dec 6, 2023

Update docs/changelog/103061.yaml

7aef899

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

Merge branch 'main' into adaptive-slicing

7ba43aa

danielmitterdorfer self-assigned this Dec 7, 2023

danielmitterdorfer requested a review from rockdaboot December 7, 2023 07:18

rockdaboot reviewed Dec 7, 2023

View reviewed changes

rockdaboot approved these changes Dec 7, 2023

View reviewed changes

danielmitterdorfer merged commit a721ed8 into elastic:main Dec 7, 2023
15 checks passed

danielmitterdorfer deleted the adaptive-slicing branch December 7, 2023 15:45

elasticsearchmachine added the backport pending label Dec 7, 2023

danielmitterdorfer added the v8.12.0 label Dec 7, 2023

danielmitterdorfer mentioned this pull request Dec 7, 2023

[8.12][Profiling] Query in parallel only if beneficial (#103061) #103144

Merged

brianseeders removed the backport pending label Jan 12, 2024

danielmitterdorfer mentioned this pull request Jan 22, 2024

[Profiling] Query in parallel on content nodes #104600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Profiling] Query in parallel only if beneficial #103061

[Profiling] Query in parallel only if beneficial #103061

danielmitterdorfer commented Dec 6, 2023

elasticsearchmachine commented Dec 6, 2023

elasticsearchmachine commented Dec 6, 2023

danielmitterdorfer commented Dec 7, 2023

rockdaboot Dec 7, 2023

danielmitterdorfer Dec 7, 2023

rockdaboot left a comment

elasticsearchmachine commented Dec 7, 2023

danielmitterdorfer commented Dec 8, 2023

[Profiling] Query in parallel only if beneficial #103061

[Profiling] Query in parallel only if beneficial #103061

Conversation

danielmitterdorfer commented Dec 6, 2023

elasticsearchmachine commented Dec 6, 2023

elasticsearchmachine commented Dec 6, 2023

danielmitterdorfer commented Dec 7, 2023

rockdaboot Dec 7, 2023

Choose a reason for hiding this comment

danielmitterdorfer Dec 7, 2023

Choose a reason for hiding this comment

rockdaboot left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Dec 7, 2023

💔 Backport failed

danielmitterdorfer commented Dec 8, 2023