Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added some ESQL queries to elastic/logs #466

Merged
merged 5 commits into from
Oct 9, 2023

Conversation

craigtaverner
Copy link
Contributor

This dataset is of interest to ESQL particularly as we're targeting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.

Currently the intention is to include all new ESQL queries within the logging-querying challenge, since that generates and benchmarks the kind of data we're interested in. When running locally, using track.params that are based on the nightlies configuration, but scaled down by a factor of ten, we see the following:

  • Data generation takes about 30min
  • Normal logging-querying takes about 60 minutes
  • New ESQL queries take about 4 minutes

So these queries do not impact the total benchmark run time by much at all.

This dataset is of interest to ESQL particularly as we're targetting observability use cases.
Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.
The original parameters resulted in all indices completely empty (zero docs).
Changing `start_date` and `end_date` to `bulk_start_date` and `bulk_end_date` resulted in only two indices getting data, the redis and k8s indices.
Adding clients settings and increasing end date and max_generated_corpus_size results in all indices getting data, and reducing raw_data_volume_per_day increases data generation performance.

These settings were chosen through trial and error to get the ESQL queries to actually run. Any smaller data sizes result in a `ValueSource mismatch` exception, likely due to some shards missing data.
The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original.
All we really needed was to index a full minnute instead of just 2s.
Some of the changes were useful only for local testing, so removing them.
Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The <start_date, end_date> IT test interval was increased to work around elastic/elasticsearch#100438. Once the issue gets addressed we can revert this increase to reduce test duration.

@craigtaverner craigtaverner merged commit 249332b into elastic:master Oct 9, 2023
11 checks passed
gbanasiak added a commit to gbanasiak/rally-tracks that referenced this pull request Oct 18, 2023
gbanasiak added a commit that referenced this pull request Oct 18, 2023
This reverts commit 249332b for
compatibility with pre-8.11 releases.
inqueue pushed a commit to inqueue/rally-tracks that referenced this pull request Dec 6, 2023
* Added some ESQL queries to `elastic/logs`

This dataset is of interest to ESQL particularly as we're targetting observability use cases.
Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.

* Change test parameters to actually generate data

The original parameters resulted in all indices completely empty (zero docs).
Changing `start_date` and `end_date` to `bulk_start_date` and `bulk_end_date` resulted in only two indices getting data, the redis and k8s indices.
Adding clients settings and increasing end date and max_generated_corpus_size results in all indices getting data, and reducing raw_data_volume_per_day increases data generation performance.

These settings were chosen through trial and error to get the ESQL queries to actually run. Any smaller data sizes result in a `ValueSource mismatch` exception, likely due to some shards missing data.

* Added one more ESQL query from observability set

* Partial revert of index setup

The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original.
All we really needed was to index a full minnute instead of just 2s.

* Minimise changes to logging-querying.json

Some of the changes were useful only for local testing, so removing them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants