Added some ESQL queries to `elastic/logs` #466

craigtaverner · 2023-10-06T13:13:25Z

This dataset is of interest to ESQL particularly as we're targeting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.

Currently the intention is to include all new ESQL queries within the logging-querying challenge, since that generates and benchmarks the kind of data we're interested in. When running locally, using track.params that are based on the nightlies configuration, but scaled down by a factor of ten, we see the following:

Data generation takes about 30min
Normal logging-querying takes about 60 minutes
New ESQL queries take about 4 minutes

So these queries do not impact the total benchmark run time by much at all.

This dataset is of interest to ESQL particularly as we're targetting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations.

elastic/logs/challenges/esql.json

The original parameters resulted in all indices completely empty (zero docs). Changing `start_date` and `end_date` to `bulk_start_date` and `bulk_end_date` resulted in only two indices getting data, the redis and k8s indices. Adding clients settings and increasing end date and max_generated_corpus_size results in all indices getting data, and reducing raw_data_volume_per_day increases data generation performance. These settings were chosen through trial and error to get the ESQL queries to actually run. Any smaller data sizes result in a `ValueSource mismatch` exception, likely due to some shards missing data.

The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original. All we really needed was to index a full minnute instead of just 2s.

Some of the changes were useful only for local testing, so removing them.

gbanasiak

LGTM.

The <start_date, end_date> IT test interval was increased to work around elastic/elasticsearch#100438. Once the issue gets addressed we can revert this increase to reduce test duration.

This reverts commit 249332b.

This reverts commit 249332b for compatibility with pre-8.11 releases.

* Added some ESQL queries to `elastic/logs` This dataset is of interest to ESQL particularly as we're targetting observability use cases. Currently ESQL is not mature enough to replace the workflows themselves, but can be used in the discover dashboard, and the queries chosen reflect possible usage in that dashboard, as well as investigating the impact of multiple grouping keys on similar aggregations. * Change test parameters to actually generate data The original parameters resulted in all indices completely empty (zero docs). Changing `start_date` and `end_date` to `bulk_start_date` and `bulk_end_date` resulted in only two indices getting data, the redis and k8s indices. Adding clients settings and increasing end date and max_generated_corpus_size results in all indices getting data, and reducing raw_data_volume_per_day increases data generation performance. These settings were chosen through trial and error to get the ESQL queries to actually run. Any smaller data sizes result in a `ValueSource mismatch` exception, likely due to some shards missing data. * Added one more ESQL query from observability set * Partial revert of index setup The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original. All we really needed was to index a full minnute instead of just 2s. * Minimise changes to logging-querying.json Some of the changes were useful only for local testing, so removing them.

craigtaverner commented Oct 6, 2023

View reviewed changes

elastic/logs/challenges/esql.json Outdated Show resolved Hide resolved

craigtaverner requested a review from gbanasiak October 6, 2023 13:23

craigtaverner added 4 commits October 9, 2023 11:25

Added one more ESQL query from observability set

0906d03

Partial revert of index setup

abc8682

The fact that the tests actually use a different challenge for index setup and querying, allows for parameters much closer to the original. All we really needed was to index a full minnute instead of just 2s.

Minimise changes to logging-querying.json

63d62cb

Some of the changes were useful only for local testing, so removing them.

gbanasiak approved these changes Oct 9, 2023

View reviewed changes

craigtaverner merged commit 249332b into elastic:master Oct 9, 2023
11 checks passed

gbanasiak added a commit to gbanasiak/rally-tracks that referenced this pull request Oct 18, 2023

Revert "Added some ESQL queries to elastic/logs (elastic#466)"

93c2e5a

This reverts commit 249332b.

gbanasiak added a commit that referenced this pull request Oct 18, 2023

Revert "Added some ESQL queries to elastic/logs (#466)" (#478)

75ccd47

This reverts commit 249332b for compatibility with pre-8.11 releases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added some ESQL queries to `elastic/logs` #466

Added some ESQL queries to `elastic/logs` #466

craigtaverner commented Oct 6, 2023

gbanasiak left a comment •

edited

Loading

Added some ESQL queries to elastic/logs #466

Added some ESQL queries to elastic/logs #466

Conversation

craigtaverner commented Oct 6, 2023

gbanasiak left a comment • edited Loading

Choose a reason for hiding this comment

Added some ESQL queries to `elastic/logs` #466

Added some ESQL queries to `elastic/logs` #466

gbanasiak left a comment •

edited

Loading