-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add throughput and latency documentation #6910
Changes from 16 commits
b7b8d03
bbb575c
43c0ecb
e3c0a3f
41e3ee0
420ac72
2969dba
7870e8f
3aa3f2a
bc76f4e
3f9b404
b61c74c
14a871d
54ecf0e
295d355
4df5186
79d525c
62d2198
3141843
c394c9b
23ae654
3f301f3
4178c2e
d2bfae5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2,7 +2,9 @@ | |||||
layout: default | ||||||
title: Concepts | ||||||
nav_order: 3 | ||||||
parent: User guide | ||||||
parent: User Guide | ||||||
redirect_from: | ||||||
- /benchmark/user-guide/concepts/concepts/ | ||||||
--- | ||||||
|
||||||
# Concepts | ||||||
|
@@ -11,7 +13,9 @@ Before using OpenSearch Benchmark, familiarize yourself with the following conce | |||||
|
||||||
## Core concepts and definitions | ||||||
|
||||||
- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/). | ||||||
- **Workload**: A collection of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workload runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/). A workload typically includes the following: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- One or more data streams that are ingested into indexes. | ||||||
- A set of queries and operations that are invoked as part of the benchmark. | ||||||
|
||||||
- **Pipeline**: A series of steps occurring before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines: | ||||||
- `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results. | ||||||
|
@@ -20,95 +24,61 @@ Before using OpenSearch Benchmark, familiarize yourself with the following conce | |||||
|
||||||
- **Test**: A single invocation of the OpenSearch Benchmark binary. | ||||||
|
||||||
A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following: | ||||||
## Test concepts | ||||||
|
||||||
- One or more data streams that are ingested into indexes. | ||||||
- A set of queries and operations that are invoked as part of the benchmark. | ||||||
At the end of each test, OpenSearch Benchmark produces a table that summarizes the following: | ||||||
|
||||||
## Throughput and latency | ||||||
- [Concepts](#concepts) | ||||||
- [Core concepts and definitions](#core-concepts-and-definitions) | ||||||
- [Test concepts](#test-concepts) | ||||||
- [Differences between OpenSearch Benchmark and a traditional client-server system](#differences-between-opensearch-benchmark-and-a-traditional-client-server-system) | ||||||
- [Processing time](#processing-time) | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- [Took time](#took-time) | ||||||
- [Service time](#service-time) | ||||||
- [Latency](#latency) | ||||||
- [Throughput](#throughput) | ||||||
|
||||||
At the end of each test, OpenSearch Benchmark produces a table that summarizes the following: | ||||||
The following diagram illustrates how each component of the table is measured during the life cycle of a request involving the OpenSearch cluster, the OpenSearch client, and OpenSearch Benchmark: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
- [Service time](#service-time) | ||||||
- Throughput | ||||||
- [Latency](#latency) | ||||||
- The error rate for each completed task or OpenSearch operation. | ||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/concepts-diagram.png" alt=""> | ||||||
|
||||||
### Differences between OpenSearch Benchmark and a traditional client-server system | ||||||
|
||||||
While the definition for _throughput_ remains consistent with other client-server systems, the definitions for `service time` and `latency` differ from most client-server systems in the context of OpenSearch Benchmark. The following table compares the OpenSearch Benchmark definition of service time and latency versus the common definitions for a client-server system. | ||||||
|
||||||
| Metric | Common definition | **OpenSearch Benchmark definition** | | ||||||
| :--- | :--- |:--- | | ||||||
| **Throughput** | The number of operations completed in a given period of time. | The number of operations completed in a given period of time. | | ||||||
| **Service time** | The amount of time that the server takes to process a request, from the point it receives the request to the point the response is returned. </br></br> It includes the time spent waiting in server-side queues but _excludes_ network latency, load balancer overhead, and deserialization/serialization. | The amount of time that it takes for `opensearch-py` to send a request and receive a response from the OpenSearch cluster. </br> </br> It includes the amount of time that it takes for the server to process a request and also _includes_ network latency, load balancer overhead, and deserialization/serialization. | | ||||||
| **Latency** | The total amount of time, including the service time and the amount of time that the request waited before responding. | Based on the `target-throughput` set by the user, the total amount of time that the request waited before receiving the response, in addition to any other delays that occured before the request is sent. | | ||||||
| **Latency** | The total amount of time, including the service time and the amount of time that the request waited before responding. | Based on the `target-throughput` set by the user, the total amount of time that the request waited before receiving the response, in addition to any other delays that occurred before the request is sent. | | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try to maintain present tense (active tone) when possible. |
||||||
|
||||||
For more information about service time and latency in OpenSearch Benchmark, see the [Service time](#service-time) and [Latency](#latency) sections. | ||||||
|
||||||
|
||||||
### Service time | ||||||
|
||||||
OpenSearch Benchmark does not have insight into how long OpenSearch takes to process a request, apart from extracting the `took` time for the request. In OpenSearch, **service time** tracks the amount of time between when OpenSearch issues a request and receives a response. | ||||||
|
||||||
OpenSearch Benchmark makes function calls to `opensearch-py` to communicate with an OpenSearch cluster. OpenSearch Benchmark tracks the amount of time between when the `opensearch-py` client sends a request and receives a response from the OpenSearch cluster and considers this to be the service time. Unlike the traditional definition of service time, the OpenSearch Benchmark definition of service time includes overhead, such as network latency, load balancer overhead, or deserialization/serialization. The following image highlights the differences between the traditional definition of service time and the OpenSearch Benchmark definition of service time. | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/service-time.png" alt=""> | ||||||
|
||||||
### Latency | ||||||
|
||||||
Target throughput is key to understanding the OpenSearch Benchmark definition of **latency**. Target throughput is the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. `target-throughput` is one of the common workload parameters that can be set for each test and is measured in operations per second. | ||||||
|
||||||
OpenSearch Benchmark always issues one request at a time for a single client thread, specified as `search-clients` in the workload parameters. If `target-throughput` is set to `0`, OpenSearch Benchmark issues a request immediately after it receives the response from the previous request. If the `target-throughput` is not set to `0`, OpenSearch Benchmark issues the next request to match the `target-throughput`, assuming that responses are returned instantaneously. | ||||||
## Processing time | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
#### Example A | ||||||
**Processing time** accounts for any extra overhead tasks that OpenSearch Benchmark performs during the life cycle of a request, such as setting up a request context manager and calling a method to pass the request off to the OpenSearch client. This is in contrast to **service time**, which only accounts for the different from when a request was sent and when the response is received by the OpenSearch client. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The following diagrams illustrate how latency is calculated with an expected request response time of 200ms and the following settings: | ||||||
### Took time | ||||||
|
||||||
- `search-clients` is set to `1`. | ||||||
- `target-throughput` is set to `1` operation per second. | ||||||
**Took time** measures the amount of time the cluster spends processing a request on the server-side. The does not include the time it took for the request to reach the cluster from the client, or the time it took for the response to get back from the client to the cluster. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/latency-explanation-1.png" alt=""> | ||||||
|
||||||
When a request takes longer than 200ms, such as when a request takes 1110ms instead of 400ms, OpenSearch Benchmark sends the next request that was supposed to occur at 4.00s based on the `target-throughput` at 4.10s. All subsequent requests after the 4.10s request attempt to resynchronize with the `target-throughput` setting. | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/latency-explanation-2.png" alt=""> | ||||||
|
||||||
When measuring the overall latency, OpenSearch Benchmark includes all performed requests. All requests have a latency of 200ms, except for the following two requests: | ||||||
|
||||||
- The request that lasted 1100ms. | ||||||
- The subsquent request that was supposed to start at 4:00s. This request was delayed by 100ms, denoted by the orange area in the following diagram, and had a response time of 200ms. When calculating the latency for this request, OpenSearch Benchmark will account for the delayed start time and combine it with the response time. Thus, the latency for this request is **300ms**. | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/latency-explanation-3.png" alt=""> | ||||||
|
||||||
#### Example B | ||||||
|
||||||
In this example, OpenSearch Benchmark assumes a latency of 200ms and uses the following latency settings: | ||||||
|
||||||
- `search_clients` is set to `1`. | ||||||
- `target-throughput` is set to `10` operations per second. | ||||||
|
||||||
The following diagram shows the schedule built by OpenSearch Benchmark with the expected response times. | ||||||
### Service time | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/b-latency-explanation-1.png" alt=""> | ||||||
OpenSearch Benchmark does not have insight into how long OpenSearch takes to process a request, apart from extracting the [took time](#took-time) for the request. In OpenSearch, **service time** tracks the amount of time between when OpenSearch issues a request and receives a response. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although "measures" implies a length or amount of time, I think removing "amount of time" from the sentence makes it more confusing, since time can mean multiple types of measurements, for example, the clock time a request was issued, versus the length of time between two requests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We already provide the information in the second sentence in the following para in slightly more detail, so I restructured.
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
However, if the assumption is that all responses will take 200ms, 10 operations per second won't be possible. Therefore, the highest throughput OpenSearch Benchmark can reach is 5 operations per second, as shown in the following diagram. | ||||||
OpenSearch Benchmark makes function calls to `opensearch-py` to communicate with an OpenSearch cluster. OpenSearch Benchmark tracks the amount of time between when the `opensearch-py` client sends a request and receives a response from the OpenSearch cluster and considers this to be the service time. Unlike the traditional definition of service time, the OpenSearch Benchmark definition of service time includes overhead, such as network latency, load balancer overhead, or deserialization/serialization. The following image highlights the differences between the traditional definition of service time and the OpenSearch Benchmark definition of service time. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/b-latency-explanation-2.png" alt=""> | ||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/service-time.png" alt=""> | ||||||
|
||||||
OpenSearch Benchmark does not account for this and continues to try to achieve the `target-throughput` of 10 operations per second. Because of this, delays for each request begin to cascade, as illustrated in the following diagram. | ||||||
### Latency | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/b-latency-explanation-3.png" alt=""> | ||||||
The total amount of time that the request waited before receiving the response, in addition to any other delays that occurred before the request is sent. In most circumstances latency is measured the same as service time, unless in testing in [Throughput-throttled mode]({{site.url}}{{site.baseurl}}/benchmark/user-guide/target-throughput/), latency is measured as service-time plus the time request spends waiting in the queue. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Combining the service time with the delay for each operation provides the following latency measurements for each operation: | ||||||
|
||||||
- 200 ms for operation 1 | ||||||
- 300 ms for operation 2 | ||||||
- 400 ms for operation 3 | ||||||
- 500 ms for operation 4 | ||||||
- 600 ms for operation 5 | ||||||
### Throughput | ||||||
|
||||||
This latency cascade continues, increasing latency by 100ms for each subsequent request. | ||||||
**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. | ||||||
|
||||||
### Recommendation | ||||||
|
||||||
As shown by the preceding examples, you should be aware of the average service time of each task and provide a `target-throughput` that accounts for the service time. The OpenSearch Benchmark latency is calculated based on the `target-throughput` set by the user, that is, the latency could be redefined as "throughput-based latency." | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,82 @@ | ||||||
--- | ||||||
layout: default | ||||||
title: Target throughput | ||||||
nav_order: 150 | ||||||
--- | ||||||
|
||||||
# Target throughput | ||||||
|
||||||
Target throughput is key to understanding the OpenSearch Benchmark definition of **latency**. Target throughput is the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. `target-throughput` is one of the common workload parameters that can be set for each test and is measured in operations per second. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
There are two types of testing modes when using OpenSearch Benchmark, both of which are related to throughput, latency, and service time: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
- [Benchmarking mode](#benchmarking-mode): Latency is measured the same as service time. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- [Throughput-throttled mode](#throughput-throttled-mode): Latency is service-time plus the time a request spends waiting in the queue. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Benchmarking mode | ||||||
|
||||||
When you do not specify a `target-throughput`, OpenSearch Benchmark latency tests are performed in **Benchmarking mode**. In **Benchmarking mode**, the OpenSearch client sends requests to the OpenSearch cluster as fast as possible. After the cluster it receives a response from the prior request sent, Benchmark sends the next request immediately to the OpenSearch client without waiting. In this testing mode, latency is identical to service time. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Throughput-throttled mode | ||||||
|
||||||
**Throughput** measures the rate at which OpenSearch Benchmark issues requests, assuming that responses will be returned instantaneously. However, users can set a `target-throughput` is one of the common workload parameters that can be set for each test and is measured in operations per second. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
OpenSearch Benchmark always issues one request at a time for a single client thread, specified as `search-clients` in the workload parameters. If `target-throughput` is set to `0`, OpenSearch Benchmark issues a request immediately after it receives the response from the previous request. If the `target-throughput` is not set to `0`, OpenSearch Benchmark issues the next request to match the `target-throughput`, assuming that responses are returned instantaneously. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
When you want to simulate the type traffic you might encounter when deploying a production cluster, set the `target-throughput` in your benchmark test to match to the rate of requests you think the production cluster might receive. The following examples illustrate how the set `target-throughput` affects the latency measurement. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Example A | ||||||
|
||||||
The following diagrams illustrate how latency is calculated with an expected request response time of 200 ms and the following settings: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
- `search-clients` is set to `1`. | ||||||
- `target-throughput` is set to `1` operation per second. | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/latency-explanation-1.png" alt=""> | ||||||
|
||||||
When a request takes longer than 200 ms, such as when a request takes 1110 ms instead of 400 ms, OpenSearch Benchmark sends the next request that was supposed to occur at 4.00 s based on the `target-throughput` at 4.10 s. All subsequent requests after the 4.10 s request attempt to re-synchronize with the `target-throughput` setting. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of "was supposed to occur", "should have occurred"? |
||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/latency-explanation-2.png" alt=""> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this image be introduced separately from the previous one? |
||||||
|
||||||
When measuring the overall latency, OpenSearch Benchmark includes all performed requests. All requests have a latency of 200 ms, except for the following two requests: | ||||||
|
||||||
- The request that lasted 1100 ms. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- The subsequent request that was supposed to start at 4:00 s. This request was delayed by 100 ms, denoted by the orange area in the following diagram, and had a response time of 200 ms. When calculating the latency for this request, OpenSearch Benchmark will account for the delayed start time and combine it with the response time. Thus, the latency for this request is **300 ms**. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/latency-explanation-3.png" alt=""> | ||||||
|
||||||
### Example B | ||||||
|
||||||
In this example, OpenSearch Benchmark assumes a latency of 200 ms and uses the following latency settings: | ||||||
|
||||||
- `search_clients` is set to `1`. | ||||||
- `target-throughput` is set to `10` operations per second. | ||||||
|
||||||
The following diagram shows the schedule built by OpenSearch Benchmark with the expected response times. | ||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/b-latency-explanation-1.png" alt=""> | ||||||
|
||||||
However, if the assumption is that all responses will take 200 ms, 10 operations per second won't be possible. Therefore, the highest throughput OpenSearch Benchmark can reach is 5 operations per second, as shown in the following diagram. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/b-latency-explanation-2.png" alt=""> | ||||||
|
||||||
OpenSearch Benchmark does not account for this and continues to try to achieve the `target-throughput` of 10 operations per second. Because of this, delays for each request begin to cascade, as illustrated in the following diagram. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "this" refer to? I suggest adding the accompanying noun in both instances. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it refers to the throughput limitation described in the preceding para and shown in the diagram, so we could include "limitation" after the first instance of "this" (but not the second one), but it's otherwise fine for me as written.
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<img src="{{site.url}}{{site.baseurl}}/images/benchmark/b-latency-explanation-3.png" alt=""> | ||||||
|
||||||
Combining the service time with the delay for each operation provides the following latency measurements for each operation: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
- 200 ms for operation 1 | ||||||
- 300 ms for operation 2 | ||||||
- 400 ms for operation 3 | ||||||
- 500 ms for operation 4 | ||||||
- 600 ms for operation 5 | ||||||
|
||||||
This latency cascade continues, increasing latency by 100 ms for each subsequent request. | ||||||
Check failure on line 76 in _benchmark/user-guide/target-throughput.md GitHub Actions / vale[vale] _benchmark/user-guide/target-throughput.md#L76
Raw output
|
||||||
|
||||||
### Recommendation | ||||||
|
||||||
As shown by the preceding examples, you should be aware of the average service time of each task and provide a `target-throughput` that accounts for the service time. The OpenSearch Benchmark latency is calculated based on the `target-throughput` set by the user, that is, the latency could be redefined as "throughput-based latency." | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest breaking up this paragraph for ease of readability.