-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added some ESQL queries to elastic/logs
#466
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This dataset is of interest to ESQL particularly as we're targetting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.
craigtaverner
commented
Oct 6, 2023
The original parameters resulted in all indices completely empty (zero docs). Changing `start_date` and `end_date` to `bulk_start_date` and `bulk_end_date` resulted in only two indices getting data, the redis and k8s indices. Adding clients settings and increasing end date and max_generated_corpus_size results in all indices getting data, and reducing raw_data_volume_per_day increases data generation performance. These settings were chosen through trial and error to get the ESQL queries to actually run. Any smaller data sizes result in a `ValueSource mismatch` exception, likely due to some shards missing data.
The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original. All we really needed was to index a full minnute instead of just 2s.
Some of the changes were useful only for local testing, so removing them.
gbanasiak
approved these changes
Oct 9, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
The <start_date, end_date>
IT test interval was increased to work around elastic/elasticsearch#100438. Once the issue gets addressed we can revert this increase to reduce test duration.
gbanasiak
added a commit
to gbanasiak/rally-tracks
that referenced
this pull request
Oct 18, 2023
This reverts commit 249332b.
gbanasiak
added a commit
that referenced
this pull request
Oct 18, 2023
This reverts commit 249332b for compatibility with pre-8.11 releases.
inqueue
pushed a commit
to inqueue/rally-tracks
that referenced
this pull request
Dec 6, 2023
* Added some ESQL queries to `elastic/logs` This dataset is of interest to ESQL particularly as we're targetting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations. * Change test parameters to actually generate data The original parameters resulted in all indices completely empty (zero docs). Changing `start_date` and `end_date` to `bulk_start_date` and `bulk_end_date` resulted in only two indices getting data, the redis and k8s indices. Adding clients settings and increasing end date and max_generated_corpus_size results in all indices getting data, and reducing raw_data_volume_per_day increases data generation performance. These settings were chosen through trial and error to get the ESQL queries to actually run. Any smaller data sizes result in a `ValueSource mismatch` exception, likely due to some shards missing data. * Added one more ESQL query from observability set * Partial revert of index setup The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original. All we really needed was to index a full minnute instead of just 2s. * Minimise changes to logging-querying.json Some of the changes were useful only for local testing, so removing them.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This dataset is of interest to ESQL particularly as we're targeting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.
Currently the intention is to include all new ESQL queries within the
logging-querying
challenge, since that generates and benchmarks the kind of data we're interested in. When running locally, using track.params that are based on the nightlies configuration, but scaled down by a factor of ten, we see the following:So these queries do not impact the total benchmark run time by much at all.