Benchmark stream command #1584

aspacca · 2023-12-05T08:51:35Z

Similarly to benchmark rally command we want to generate schema-b documents for a given integrations.
Instead of creating a rally track out of them we will stream them, according to a configurable rate, directly to an ES cluster, using bulk requets

see #1541 for more context

usage (from a package root):

elastic-package benchmark stream -v --events-per-period 10 --period-duration 1s

or

elastic-package benchmark stream -v --events-per-period 10 --period-duration 1s --backfill -15m

or

elastic-package benchmark stream -v --benchmark container-benchmark --events-per-period 10 --period-duration 1s --backfill -15m

flags:

--benchmark: run a specific benchmark, if not present all benchmarks for a packages will be run
--backfill: negative duration to backfill events ingestion for, if not present event will be ingested since now
--period-duration: time between each bulk request
--events-per-period: events on each bulk request
--timestamp-field: field from generator config used for @timestamp event's field (default "timestamp": it is required for backfill and overriding periodsettings)
--perform-cleanup: passing this flag will delete documents in the streaming data streams before and after the streaming, as well as uninstalling the integration at the end

ruflin · 2023-12-05T10:12:41Z

--benchmark: run a specific benchmark, if not present all benchmarks for a packages will be run

For now, it is really nice that it runs just all tracks. I expect eventually we need to have something like "default" tracks but we can put that for later.

--ticker-duration: time between each bulk request

ticker seems to be very Golang specific. Ideas for alternative names?

I did a quick run of the code, so far all looks good 🎉

aspacca · 2023-12-05T10:14:36Z

ticker seems to be very Golang specific. Ideas for alternative names?

--bulk-request-interval: time between each bulk request
--events-per-bulk-request: events on each bulk request

?

ruflin · 2023-12-05T10:18:42Z

Is it possible to configure the stream command to ship data to a cluster not started with elastic-package? I assume the env variables could be adjusted and it would just work 🤔

aspacca · 2023-12-05T10:22:13Z

I assume the env variables could be adjusted and it would just work 🤔

indeed:

ELASTIC_PACKAGE_ELASTICSEARCH_HOST
ELASTIC_PACKAGE_ELASTICSEARCH_PASSWORD
ELASTIC_PACKAGE_ELASTICSEARCH_USERNAME
ELASTIC_PACKAGE_KIBANA_HOST

ruflin

I have been thinking more about our conversation around the naming. One thing I realised that might be obvious is that if multiple benchmarks are run in parallel, all of them will have the same ticker period. And the config is per benchmark. Lets assume I have the following config and 2 benchmarks exist:

--ticker-duration: 10
--events-per-ticker: 1

I assume this sends 2 events, 1 for each benchmark, every 10 seconds. The ticker duration reminds me a lot of the period config in Metricbeat. And the second is then events-per-period? I would stay away from bulk-request as it would also be ok, if we don't use bulk requests :-) Maybe exchange period with duration?

An alternative config would be --events-per-second=0.1. This would ship one event every 10 seconds. Is this more consumable?

Also we should have defaults for all configs, so if they are skipped it just works. For example if we stick to the previous configs, have 10s period as default and 1 event per 10s.

cmd/benchmark.go

internal/benchrunner/runners/stream/runner.go

internal/cobraext/flags.go

aspacca · 2023-12-05T23:49:07Z

An alternative config would be --events-per-second=0.1. This would ship one event every 10 seconds. Is this more consumable?

it's better because we have only one flag, but it's rather tricky to express period duration where you want resolution of minutes, I would keep the two different flags, and just rename them

ruflin · 2023-12-06T15:17:37Z

I would keep the two different flags, and just rename them

Ok, lets go with this for now.

We have now quite a list of command and flags, as soon as things settle down a bit more there is an opportunity to look at all flags together and unify / standardise / cleanup where we can.

ruflin · 2023-12-07T08:31:05Z

I did a quick test with the most recent AWS package by running: elastic-package benchmark stream -v This uses all the default. Opening Kibana with Logs Explorer shows the following result which is great!

ruflin · 2023-12-07T08:40:37Z

README.md

+You can stream data to a remote ES cluster setting the following environment variables:
+
+ELASTIC_PACKAGE_ELASTICSEARCH_HOST=https://my-deployment.es.eu-central-1.aws.foundit.no
+ELASTIC_PACKAGE_ELASTICSEARCH_USERNAME=elastic


Do API keys work here? Serverless has mostly API keys

no, it doesn't https://github.com/elastic/elastic-package/blob/main/internal/stack/clients.go#L22-L25

but it should be possible to run something like:

elastic-package stack up --provider serverless $(elastic-package stack shellinit)

@jsoriano Seems like a missing feature in elastic-package? Should we open an issue?

Yes, this is missing, please create an issue.

ruflin · 2023-12-07T08:53:02Z

internal/cobraext/flags.go

+	BenchStreamPeriodDurationFlagName        = "period-duration"
+	BenchStreamPeriodDurationFlagDescription = "duration of the period between each ingestion cycle: expressed as a positive duration"
+
+	BenchStreamPerformCleanupFlagName        = "perform-cleanup"


The part I stumbled into, this does not only cleanup at the end, but it also does cleanup on start. Is this expected?

yes, it is intended: I thought it might be useful if I changed the template and wanted to compare data before and after

while indeed, since now backfill has a default value, we will have duplicated data for the last 15 minutes.
I can change to avoid cleanup only in the end, but to do it on start

On the fence about this one. As now perform-cleanup is not the default anymore, I think it is less of an issue. Lets see how it is used and come back to this but leave it for now.

ruflin · 2023-12-07T08:53:38Z

internal/cobraext/flags.go

@@ -65,6 +65,21 @@ const (
 	BenchCorpusRallyUseCorpusAtPathFlagName        = "use-corpus-at-path"
 	BenchCorpusRallyUseCorpusAtPathFlagDescription = "path of the corpus to use for the benchmark: if present no new corpus will be generated"

+	BenchStreamBackFillFlagName        = "backfill"
+	BenchStreamBackFillFlagDescription = "amount of time to ingest events for since starting from now: expressed as a negative duration"


The description is very clear, but of course I put in 1m at first.

It is indeed a bit weird to have a parameter that only accepts negative numbers. Could we revert it, so it accepts only positive numbers and we negate it in code?

ruflin

Code LGTM

As a follow up, I think there is some potential to refactor / cleanup but we can take this separate.

ruflin · 2023-12-07T09:59:22Z

internal/benchrunner/runners/stream/scenario.go

+	Generator *generator `config:"generator" json:"generator"`
+}
+
+type generator struct {


This seems to be almost the same as https://github.com/elastic/elastic-package/blob/main/internal/benchrunner/runners/rally/scenario.go, same for other objects. Is the duplication on purpose?

+1, we should probably try to reduce duplication, there is quite a lot between benchmark runners. This would also help to evaluate better how much logic to maintain we are adding with each runner.

yes, let's plan for another PR moving repeated code to some common package under benchmark

ruflin · 2023-12-07T10:00:57Z

@aspacca Is there a way we could have tests for these features

jsoriano

Mostly looks good to me. Added a comment about ignoring the error in one flag, and some other questions that would not be blockers before merging.

cmd/benchmark.go

jsoriano · 2023-12-07T14:33:28Z

internal/benchrunner/runners/stream/scenario.go

+	Generator *generator `config:"generator" json:"generator"`
+}
+
+type generator struct {


+1, we should probably try to reduce duplication, there is quite a lot between benchmark runners. This would also help to evaluate better how much logic to maintain we are adding with each runner.

jsoriano · 2023-12-07T14:42:13Z

internal/cobraext/flags.go

@@ -65,6 +65,21 @@ const (
 	BenchCorpusRallyUseCorpusAtPathFlagName        = "use-corpus-at-path"
 	BenchCorpusRallyUseCorpusAtPathFlagDescription = "path of the corpus to use for the benchmark: if present no new corpus will be generated"

+	BenchStreamBackFillFlagName        = "backfill"
+	BenchStreamBackFillFlagDescription = "amount of time to ingest events for since starting from now: expressed as a negative duration"


It is indeed a bit weird to have a parameter that only accepts negative numbers. Could we revert it, so it accepts only positive numbers and we negate it in code?

internal/benchrunner/runners/stream/runner.go

jsoriano

I think we need to handle the error in the flag, as we do with all other flags. For the rest it LGTM.

cmd/benchmark.go

aspacca · 2023-12-08T10:55:45Z

@jsoriano

The error here can happen if the flag is not defined or if there is some type conflict, it should not happen if the user does not provide the flag. We should report the error if it happens.

I don't want to return an error if there is some type conflict, I want the consumer of the command to be able to see the command running with the default value without any error. :)

but we can warn the user with a log entry :)

aspacca · 2023-12-08T10:57:04Z

@aspacca Is there a way we could have tests for these features

@ruflin we can have some end2end tests: I will add in a separate PR

Co-authored-by: Jaime Soriano Pastor <[email protected]>

cmd/benchmark.go

elasticmachine · 2023-12-08T12:41:43Z

💚 Build Succeeded

Buildkite Build
Commit: 045d42b

History

💚 Build #2019 succeeded 9017170
💚 Build #2015 succeeded 6398ec6
💚 Build #2014 succeeded 537fd4b
💚 Build #2010 succeeded 6794803
💔 Build #2007 failed 75937b4

cc @aspacca

Andrea Spacca added 6 commits December 5, 2023 17:37

benchmark stream command scaffold

a9f3f27

benchmark stream command implementation

b3a776c

docs

9d12b2d

TEMPORARY: use tool generator branch in order to set field in config

f29ef5f

fix bug

75937b4

io.EOF is err != nil

9e750a3

aspacca requested review from jsoriano and ruflin December 5, 2023 08:51

aspacca self-assigned this Dec 5, 2023

aspacca mentioned this pull request Dec 5, 2023

Sample data generation #984

Closed

ping generator v0.9.0

6794803

ruflin reviewed Dec 5, 2023

View reviewed changes

cmd/benchmark.go Show resolved Hide resolved

cmd/benchmark.go Show resolved Hide resolved

internal/benchrunner/runners/stream/runner.go Show resolved Hide resolved

internal/cobraext/flags.go Outdated Show resolved Hide resolved

Andrea Spacca added 2 commits December 6, 2023 09:13

cr fixes

6cf9f12

fix flags description

537fd4b

add remove ES cluster docs for benchmark stream

6398ec6

ruflin reviewed Dec 7, 2023

View reviewed changes

ruflin approved these changes Dec 7, 2023

View reviewed changes

jsoriano reviewed Dec 7, 2023

View reviewed changes

jsoriano reviewed Dec 8, 2023

View reviewed changes

cmd/benchmark.go Outdated Show resolved Hide resolved

aspacca and others added 2 commits December 8, 2023 20:10

Update cmd/benchmark.go

52e3c59

Co-authored-by: Jaime Soriano Pastor <[email protected]>

make backfill a positive duration

9017170

jsoriano reviewed Dec 8, 2023

View reviewed changes

cmd/benchmark.go Outdated Show resolved Hide resolved

Update cmd/benchmark.go

045d42b

jsoriano approved these changes Dec 8, 2023

View reviewed changes

jsoriano merged commit f075590 into elastic:main Dec 8, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark stream command #1584

Benchmark stream command #1584

aspacca commented Dec 5, 2023 •

edited

Loading

ruflin commented Dec 5, 2023

aspacca commented Dec 5, 2023

ruflin commented Dec 5, 2023

aspacca commented Dec 5, 2023

ruflin left a comment

aspacca commented Dec 5, 2023

ruflin commented Dec 6, 2023

ruflin commented Dec 7, 2023

ruflin Dec 7, 2023

aspacca Dec 7, 2023

ruflin Dec 7, 2023

jsoriano Dec 7, 2023

ruflin Dec 7, 2023

aspacca Dec 7, 2023

ruflin Dec 7, 2023

ruflin Dec 7, 2023

jsoriano Dec 7, 2023

ruflin left a comment

ruflin Dec 7, 2023

jsoriano Dec 7, 2023

aspacca Dec 8, 2023

ruflin commented Dec 7, 2023

jsoriano left a comment

jsoriano Dec 7, 2023

jsoriano Dec 7, 2023

jsoriano left a comment

aspacca commented Dec 8, 2023

aspacca commented Dec 8, 2023

elasticmachine commented Dec 8, 2023

Benchmark stream command #1584

Benchmark stream command #1584

Conversation

aspacca commented Dec 5, 2023 • edited Loading

ruflin commented Dec 5, 2023

aspacca commented Dec 5, 2023

ruflin commented Dec 5, 2023

aspacca commented Dec 5, 2023

ruflin left a comment

Choose a reason for hiding this comment

aspacca commented Dec 5, 2023

ruflin commented Dec 6, 2023

ruflin commented Dec 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Dec 7, 2023

jsoriano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano left a comment

Choose a reason for hiding this comment

aspacca commented Dec 8, 2023

aspacca commented Dec 8, 2023

elasticmachine commented Dec 8, 2023

💚 Build Succeeded

History

aspacca commented Dec 5, 2023 •

edited

Loading