Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add understanding results page #6984

Merged
merged 18 commits into from
May 1, 2024
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions _benchmark/user-guide/understanding-results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
layout: default
title: Understanding results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understanding workload test results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Understanding benchmark results?" What do you think @IanHoang

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
nav_order: 22
parent: User guide
---


At the end of each test run, a summary table is produced which includes metrics like service time, throughput, latency, and more. These metrics provide insights into how the workload selected performed on a benchmarked OpenSearch cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come later in the topic, perhaps to introduce a sample table.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The following guide gives information about how to understand the results of the summary report.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## OpenSearch Benchmark runs

OpenSearch Benchmark runs a series of nightly tests targeting the overall OpenSearch development cluster. These runs can be found on https://opensearch.org/benchmarks. It compares several metrics across different test runs targeting both recent and future versions of OpenSearch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds better as the opening paragraph because the first paragraph doesn't give any context. Can you move it up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to the link provided in the paragraph. It only applies as an example and not as the standard for summary reports. In the orginal draft, this section was at the bottom and included a disclaimer similar to the following, "Use the nightly benchmark runs as an example of how to present your benchmark results in OpenSearch Dashboards."

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Selecting metrics to compare
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

While an OpenSearch Benchmark summary report provides many metrics related to the performance of your cluster, how to compare and use those metrics depends on your use case. Some users might be interested in the number of documents their can index, while another might be interested in the amount of latency or service time it takes for a document to be queried. For example, during the OpenSearch Benchmark nightly runs, the OpenSearch teams pulls metrics similar to the following summary report:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```bash
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------

| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-------------------------------------------:|------------:|-------:|
| Cumulative indexing time of primary shards | | 0.02655 | min |
| Min cumulative indexing time across primary shards | | 0 | min |
| Median cumulative indexing time across primary shards | | 0.00176667 | min |
| Max cumulative indexing time across primary shards | | 0.0140333 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 0.0102333 | min |
| Cumulative merge count of primary shards | | 3 | |
| Min cumulative merge time across primary shards | | 0 | min |
| Median cumulative merge time across primary shards | | 0 | min |
| Max cumulative merge time across primary shards | | 0.0102333 | min |
| Cumulative merge throttle time of primary shards | | 0 | min |
| Min cumulative merge throttle time across primary shards | | 0 | min |
| Median cumulative merge throttle time across primary shards | | 0 | min |
| Max cumulative merge throttle time across primary shards | | 0 | min |
| Cumulative refresh time of primary shards | | 0.0709333 | min |
| Cumulative refresh count of primary shards | | 118 | |
| Min cumulative refresh time across primary shards | | 0 | min |
| Median cumulative refresh time across primary shards | | 0.00186667 | min |
| Max cumulative refresh time across primary shards | | 0.0511667 | min |
| Cumulative flush time of primary shards | | 0.00963333 | min |
| Cumulative flush count of primary shards | | 4 | |
| Min cumulative flush time across primary shards | | 0 | min |
| Median cumulative flush time across primary shards | | 0 | min |
| Max cumulative flush time across primary shards | | 0.00398333 | min |
| Total Young Gen GC time | | 0 | s |
| Total Young Gen GC count | | 0 | |
| Total Old Gen GC time | | 0 | s |
| Total Old Gen GC count | | 0 | |
| Store size | | 0.000485923 | GB |
| Translog size | | 2.01873e-05 | GB |
| Heap used for segments | | 0 | MB |
| Heap used for doc values | | 0 | MB |
| Heap used for terms | | 0 | MB |
| Heap used for norms | | 0 | MB |
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 32 | |
| Min Throughput | index | 3008.97 | docs/s |
| Mean Throughput | index | 3008.97 | docs/s |
| Median Throughput | index | 3008.97 | docs/s |
| Max Throughput | index | 3008.97 | docs/s |
| 50th percentile latency | index | 351.059 | ms |
| 100th percentile latency | index | 365.058 | ms |
| 50th percentile service time | index | 351.059 | ms |
| 100th percentile service time | index | 365.058 | ms |
| error rate | index | 0 | % |
| Min Throughput | wait-until-merges-finish | 28.41 | ops/s |
| Mean Throughput | wait-until-merges-finish | 28.41 | ops/s |
| Median Throughput | wait-until-merges-finish | 28.41 | ops/s |
| Max Throughput | wait-until-merges-finish | 28.41 | ops/s |
| 100th percentile latency | wait-until-merges-finish | 34.7088 | ms |
| 100th percentile service time | wait-until-merges-finish | 34.7088 | ms |
| error rate | wait-until-merges-finish | 0 | % |
| Min Throughput | match_all | 36.09 | ops/s |
| Mean Throughput | match_all | 36.09 | ops/s |
| Median Throughput | match_all | 36.09 | ops/s |
| Max Throughput | match_all | 36.09 | ops/s |
| 100th percentile latency | match_all | 35.9822 | ms |
| 100th percentile service time | match_all | 7.93048 | ms |
| error rate | match_all | 0 | % |

[...]

| Min Throughput | term | 16.1 | ops/s |
| Mean Throughput | term | 16.1 | ops/s |
| Median Throughput | term | 16.1 | ops/s |
| Max Throughput | term | 16.1 | ops/s |
| 100th percentile latency | term | 131.798 | ms |
| 100th percentile service time | term | 69.5237 | ms |
| error rate | term | 0 | % |
```

Metrics unique to the cluster begin at the `index` task line. The following two use cases can give you an idea of what metrics might be relevant to you:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

- To assess how much load your cluster can handle, the `index` task metrics provide the number of documents ingested during the workload run, as well as the ingestion error rate.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- To assess the measurable latency and service time of the queries in the workload, the `match_all` and `term` give both the number of query operations performed per second and the measurable latency of the query, as well as the error rate when running query operations.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved


## Result storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUPER Nit: Result -> Results

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Results from OpenSearch Benchmark are stored in two ways, either in-memory or in an external metric store.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

When stored in-memory, results can be found in the `/.benchmark/benchmarks/test_executions/<test_execution_id>` directory. Results are named based off of the `test_execution_id` given to the workload test during its last run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When stored in-memory, results can be found in the `/.benchmark/benchmarks/test_executions/<test_execution_id>` directory. Results are named based off of the `test_execution_id` given to the workload test during its last run.
When stored in-memory, results are found in the `/.benchmark/benchmarks/test_executions/<test_execution_id>` directory. The name of the result is based on the `test_execution_id` given to the workload test during its most recent run.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

While [running a test](https://opensearch.org/docs/latest/benchmark/reference/commands/execute-test/#general-settings), you can also customize where the results are stored, using any combination of the following command flags:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

* `--results-file`: When provided a file path, writes the summary report to the file indicated in the path.
* `--results-format`: Defines the output format for the command line results, either `markdown` or `csv`. Default is `markdown`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command line results -> summary report

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
* `--show-in-results`: Defines which values are shown in the published summary report, either `available`, `all-percentiles`, or `all`. Default is `available`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
* `--user-tag`: Defines user-specific key-value pairs to be used in the metric record as meta information, for example, `intention:baseline-ticket-12345`. This is useful when storing metrics and results in an external metric store.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Loading