Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not generate empty buckets for the date histogram #89070

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Aug 3, 2022

If the date histogram interval is large and the 'fixed_interval'
parameter is very small we might end up with a large number of
buckets in the resulting histogram, in case we also generate empty
buckets. Roughly (max date - min date) / fixed_interval > 65536.

Here we set minDocCount to 1 so to avoid generation of empty buckets.
In the test the maximum value for 'docCount' is 9000 which means,
in the worst case, we generate 9000 documents, each belonging to a
different bucket. In the worst case we would have 9000 buckets
maximum which is well below the maximum number of buckets
allowed by default (65536).

Resolves #88800.

If the date histogram interval is large and the 'fixed_interval'
parameter is very small we might end up with a large number of
buckets in the resulting histogram, in case we also generate empty
buckets. As a result of this we might generate too many buckets
(max date - min date) / fixed_interval > 65536 (roughly)..

Here we set minDocCount to 1 so to avoid generation of empty buckets.
In the test the maximum value for 'docCount' is 9000 which means,
in the worsta case we generate 9000 documents, each belonging to a
different bucket. In this case we would have 9000 buckets maximum
which is well below the default maximum number of buckets allowed by
default.
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.5.0 labels Aug 3, 2022
@salvatore-campagna salvatore-campagna added v8.4.0 :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics >bug and removed needs:triage Requires assignment of a team area label v8.5.0 labels Aug 3, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 3, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@salvatore-campagna
Copy link
Contributor Author

I tested this running the test locally "until failure". After more than 1000 executions I don't see any failure. Before the patch I could see the failure fairly quickly after a few tens executions (depending on random values).

@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine update branch

Copy link
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I left a comment about labelling this PR. You do not need another review after fixing this

docs/changelog/89070.yaml Outdated Show resolved Hide resolved
@salvatore-campagna salvatore-campagna added >test-failure Triaged test failures from CI and removed >bug labels Aug 3, 2022
@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine test this please

@salvatore-campagna salvatore-campagna changed the title fix: do not generate empty buckets for the date histogram Do not generate empty buckets for the date histogram Aug 4, 2022
@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-2

@salvatore-campagna salvatore-campagna merged commit 36c4a17 into elastic:main Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] RollupActionSingleNodeTests testRollupDatastream failing
3 participants