-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache numDeletesToMerge
in MergePolicy
#35594
Conversation
Pinging @elastic/es-distributed |
I benchmarked this in terms of impact on total_time in merges and refreshes. I also measured the impact in indexing throughput, even thought the PR is about the merge policy, but as expected there was no impact there. All this was done on the AWS environment we used for most of the benchmarks in #31717. Used a subset of http_logs ( The indexing scenario is with 25% conflicts, with a bias towards more recent ids ( ES master commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
To have a better understanding of the impact of an update workload (index on conflict) we benchmarked ES master at this commit: which contains 3fb5a12 including the PRs: apache/lucene-solr#521 and apache/lucene-solr#522 with specific improvements for updates. The http_logs track was used, in a 1-node scenario and we compared performance with
Also captured the total counts for merge/refresh/count (from the
With soft_deletes enabled: Rally command:
Additional parameters used (for contender "index.soft_deletes.enabled" was changed to true):
|
@dliappis I take from these numbers that we are more memory efficient now with soft-delets which I expected which caused less refreshes. We are refreshing about 56% of the time compared to hard deletes which means we are filling up buffers quite heavily causing more load on the refresh which results in the higher number time wise spend in refresh I personally think we can close this PR since the numbers look much better now. Yet there is one more experiment that I would like to run if possible. I would love to see how numbers behave if we set |
@s1monw I guess the setting value should be |
@dliappis I meant 1d to make sure we don’t mark it search idle so we have a number that is based on refresh every second |
Thanks for the clarification @s1monw |
Use cumulative instead of total for summary reports and comparisons and clarify that min/median/max are across primary shards. Relates: elastic/elasticsearch#35594 (comment)
Due to #608 it's likely we need to benchmark scenarios without using the node-stats telemetry device. At the same time we want to get a general idea of how many refreshes/merges/flushes happened (in total) by accessing the index stats. Add total count for merges/refresh/flush in summary output; this is collected from `_all/primaries` in `_stats`. Also that these values are cumulatives from primary shards and also clarify desciprion for min/median/max in the summary report. Finally fix bug where index stats where time/count == 0 got skipped from the summary. Closes #614 Relates: #615 Relates: elastic/elasticsearch#35594 (comment)
Benchmark results using (Note: the names of the columns contain the renamed fields done in elastic/rally#615 but still display the same fields as the ones in previous results).
Refresh/merge/flush counts: baseline:
contender:
|
With update and refresh heavy usecases we might execute a lot of queries given we have a retention policy when we flush or refresh to make the right decision in the MergePolicy. Yet, having not absolutely correct values isn't crucial for the merge policy to be accurate. Yet, this can cause significant slowdowns on indexing speed. In order to find a middle-ground this change caches the results of the merge policy for 1 minute by default.