Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[O11y][MySQL] Rally benchmark mysql.performance #8761

Merged
merged 6 commits into from
Jan 11, 2024

Conversation

ali786XI
Copy link
Contributor

@ali786XI ali786XI commented Dec 20, 2023

Proposed commit message

  • This PR adds benchmarking templates to the performance data stream of MySQL

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.

How to test this PR locally

Run this command from package root

  • elastic-package benchmark rally --benchmark performance-benchmark -v

Related issues

Screenshots

--- Benchmark results for package: mysql - START ---
╭────────────────────────────────────────────────────────────────────────────────────╮
│ info                                                                               │
├────────────────────────┬───────────────────────────────────────────────────────────┤
│ benchmark              │                                     performance-benchmark │
│ description            │         Benchmark 20000 mysql.performance events ingested │
│ run ID                 │                      84d6912f-4b57-4f5d-8155-4b3831987cf1 │
│ package                │                                                     mysql │
│ start ts (s)           │                                                1702989997 │
│ end ts (s)             │                                                1702990043 │
│ duration               │                                                       46s │
│ generated corpora file │ /root/.elastic-package/tmp/rally_corpus/corpus-2961543811 │
╰────────────────────────┴───────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────╮
│ parameters                                                                │
├─────────────────────────────────┬─────────────────────────────────────────┤
│ package version                 │                                  1.16.0 │
│ data_stream.name                │                             performance │
│ corpora.generator.total_events  │                                   20000 │
│ corpora.generator.template.path │ ./performance-benchmark/template.ndjson │
│ corpora.generator.template.raw  │                                         │
│ corpora.generator.template.type │                                  gotext │
│ corpora.generator.config.path   │      ./performance-benchmark/config.yml │
│ corpora.generator.config.raw    │                                   map[] │
│ corpora.generator.fields.path   │      ./performance-benchmark/fields.yml │
│ corpora.generator.fields.raw    │                                   map[] │
╰─────────────────────────────────┴─────────────────────────────────────────╯
╭───────────────────────╮
│ cluster info          │
├───────┬───────────────┤
│ name  │ elasticsearch │
│ nodes │             1 │
╰───────┴───────────────╯
╭───────────────────────────────────────╮
│ disk usage for index metrics-mysql.pe │
│ rformance-ep (for all fields)         │
├──────────────────────────────┬────────┤
│ total                        │  16 MB │
│ inverted_index.total         │ 7.0 MB │
│ inverted_index.stored_fields │ 5.9 MB │
│ inverted_index.doc_values    │ 2.8 MB │
│ inverted_index.points        │ 482 kB │
│ inverted_index.norms         │    0 B │
│ inverted_index.term_vectors  │    0 B │
│ inverted_index.knn_vectors   │    0 B │
╰──────────────────────────────┴────────╯
╭──────────────────────────────────────────────────────────────────────────────────────╮
│ pipeline metrics-mysql.performance-1.16.0 stats in node UhUgzz1BQ5OJkbI7NZbkyw       │
├─────────────────────────────────────────────┬────────────────────────────────────────┤
│ Totals                                      │ Count: 20000 | Failed: 0 | Time: 140ms │
│ gsub ()                                     │  Count: 20000 | Failed: 0 | Time: 32ms │
│ script ()                                   │   Count: 20000 | Failed: 0 | Time: 6ms │
│ fingerprint ()                              │  Count: 20000 | Failed: 0 | Time: 40ms │
│ remove ()                                   │  Count: 20000 | Failed: 0 | Time: 15ms │
│ pipeline (metrics-mysql.performance@custom) │   Count: 20000 | Failed: 0 | Time: 3ms │
╰─────────────────────────────────────────────┴────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────╮
│ rally stats                                                                                 │
├────────────────────────────────────────────────────────────────┬────────────────────────────┤
│ Cumulative indexing time of primary shards                     │                3.02165 min │
│ Min cumulative indexing time across primary shards             │                      0 min │
│ Median cumulative indexing time across primary shards          │   0.004824999999999999 min │
│ Max cumulative indexing time across primary shards             │      2.669366666666667 min │
│ Cumulative indexing throttle time of primary shards            │                      0 min │
│ Min cumulative indexing throttle time across primary shards    │                      0 min │
│ Median cumulative indexing throttle time across primary shards │                    0.0 min │
│ Max cumulative indexing throttle time across primary shards    │                      0 min │
│ Cumulative merge time of primary shards                        │     0.2939166666666667 min │
│ Cumulative merge count of primary shards                       │                        155 │
│ Min cumulative merge time across primary shards                │                      0 min │
│ Median cumulative merge time across primary shards             │                0.00345 min │
│ Max cumulative merge time across primary shards                │    0.11781666666666667 min │
│ Cumulative merge throttle time of primary shards               │                      0 min │
│ Min cumulative merge throttle time across primary shards       │                      0 min │
│ Median cumulative merge throttle time across primary shards    │                    0.0 min │
│ Max cumulative merge throttle time across primary shards       │                      0 min │
│ Cumulative refresh time of primary shards                      │     0.6970833333333334 min │
│ Cumulative refresh count of primary shards                     │                       5236 │
│ Min cumulative refresh time across primary shards              │                      0 min │
│ Median cumulative refresh time across primary shards           │               0.003975 min │
│ Max cumulative refresh time across primary shards              │                0.57325 min │
│ Cumulative flush time of primary shards                        │     2.3280666666666665 min │
│ Cumulative flush count of primary shards                       │                       4980 │
│ Min cumulative flush time across primary shards                │ 1.6666666666666667e-05 min │
│ Median cumulative flush time across primary shards             │    0.07664166666666666 min │
│ Max cumulative flush time across primary shards                │    0.18863333333333332 min │
│ Total Young Gen GC time                                        │                    0.034 s │
│ Total Young Gen GC count                                       │                          3 │
│ Total Old Gen GC time                                          │                        0 s │
│ Total Old Gen GC count                                         │                          0 │
│ Store size                                                     │     0.11265543662011623 GB │
│ Translog size                                                  │    0.037036485970020294 GB │
│ Heap used for segments                                         │                       0 MB │
│ Heap used for doc values                                       │                       0 MB │
│ Heap used for terms                                            │                       0 MB │
│ Heap used for norms                                            │                       0 MB │
│ Heap used for points                                           │                       0 MB │
│ Heap used for stored fields                                    │                       0 MB │
│ Segment count                                                  │                        304 │
│ Total Ingest Pipeline count                                    │                      20044 │
│ Total Ingest Pipeline time                                     │                    1.072 s │
│ Total Ingest Pipeline failed                                   │                          0 │
│ Min Throughput                                                 │            32944.34 docs/s │
│ Mean Throughput                                                │            32944.34 docs/s │
│ Median Throughput                                              │            32944.34 docs/s │
│ Max Throughput                                                 │            32944.34 docs/s │
│ 50th percentile latency                                        │       549.1372365504503 ms │
│ 100th percentile latency                                       │       563.3372648153454 ms │
│ 50th percentile service time                                   │       549.1372365504503 ms │
│ 100th percentile service time                                  │       563.3372648153454 ms │
│ error rate                                                     │                     0.00 % │
╰────────────────────────────────────────────────────────────────┴────────────────────────────╯

--- Benchmark results for package: mysql - END   ---
Done

@elasticmachine
Copy link

elasticmachine commented Dec 20, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-12-20T09:36:06.084+0000

  • Duration: 63 min 26 sec

Test stats 🧪

Test Results
Failed 0
Passed 41
Skipped 0
Total 41

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (2/2) 💚
Files 100.0% (2/2) 💚
Classes 100.0% (2/2) 💚
Methods 92.857% (26/28) 👍 1.948
Lines 94.34% (150/159) 👍 44.68
Conditionals 100.0% (0/0) 💚

@elasticmachine
Copy link

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@ali786XI ali786XI marked this pull request as ready for review December 20, 2023 12:51
@ali786XI ali786XI requested a review from a team as a code owner December 20, 2023 12:51
@milan-elastic milan-elastic self-requested a review December 21, 2023 07:27
@mrodm
Copy link
Contributor

mrodm commented Dec 21, 2023

Hi @aliabbas-elastic, please update your branch with the latest contents from main branch. There was an important PR merged updating the CI pipelines. Thanks!

Copy link
Contributor

@milan-elastic milan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

min: 0
max: 200
fuzziness: 0.2
cardinality: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still a little confused about how cardinality of fields 'links' them. does this mean that for a given host id, the fetch_count will always be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this lets say the events we generate are 20,000 so for these events there would be 100 unique values of fetch_count generated. In other sense, I would expect a single value to repeat for over 200 times as per the doc saying this.
I don't think this metric value links to host id field as the values generated are totally random

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we on the right track here @aspacca ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this lets say the events we generate are 20,000 so for these events there would be 100 unique values of fetch_count generated. In other sense, I would expect a single value to repeat for over 200 times as per the doc saying this.

@aliabbas-elastic , @tommyers-elastic this is indeed correct

but given that the cardinality of host.name is 100 too, a single host.name value will be repeated for 200 time as well: more specifically, the same Nth value of fetch_count will be repeated always in the same event where the same Nth value of host.name will be repeated

if either you did on purpose, or you don't care about this details, no need to change anything

instead if you explicitly want to avoid such "link", you should set different cardinalities between the two fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aspacca Thanks for correcting here. I think there is no meaning of generating different host.name and keeping them as it is would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think as-is we generate unrealistic data here, since the fetch_count will always be identical for each host.

if we wanted to keep each field (host.name, fetch_count) with a cardinality of approximately 100, what configuration would you suggest here @aspacca ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tommyers-elastic it depends on how many events we want to generate

imagine we chose 101 and 99, this means the the first 9999 events (101 * 99) will have non linked host.name and fetch_count, but then the same exact series of 9999 events of non linked value pairs will be repeated.

there are not exact cardinality values to set, but for not being them a factor one of the other (so: no 100 and 50 - these will lead to two fields just to be linked 1:2 instead of 1:1 :))

once you follow this rule the "best" values are just the values that, multiplied, produce a series that's long enough for be considered "realistic": that's domain knowledge somehow that @aliabbas-elastic is best suited than me to apply :)

Copy link
Contributor

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

min: 0
max: 200
fuzziness: 0.2
cardinality: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this lets say the events we generate are 20,000 so for these events there would be 100 unique values of fetch_count generated. In other sense, I would expect a single value to repeat for over 200 times as per the doc saying this.

@aliabbas-elastic , @tommyers-elastic this is indeed correct

but given that the cardinality of host.name is 100 too, a single host.name value will be repeated for 200 time as well: more specifically, the same Nth value of fetch_count will be repeated always in the same event where the same Nth value of host.name will be repeated

if either you did on purpose, or you don't care about this details, no need to change anything

instead if you explicitly want to avoid such "link", you should set different cardinalities between the two fields

"{{ generate "host_ip" }}"
],
"mac": [
"02:42:c0:a8:f4:07"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any implication of having different host.ip with the same mac address? (like for dashboards or similar)

if not, no need to change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a static value for these fields would be fine. Let me update it.

@ali786XI ali786XI merged commit 7ff565e into elastic:main Jan 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:mysql MySQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants