From 1fe2ef6bdf60afa4efd2a941837ec764c35d5770 Mon Sep 17 00:00:00 2001 From: Kaushal Kumar Date: Tue, 29 Oct 2024 16:01:46 -0700 Subject: [PATCH 01/17] add wlm feature overview Signed-off-by: Kaushal Kumar --- .../wlm-feature-overview.md | 113 ++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 _tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md new file mode 100644 index 0000000000..09990cfccb --- /dev/null +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -0,0 +1,113 @@ +--- +layout: default +title: Workload Management +nav_order: 62 +has_children: false +parent: workload-management +redirect_from: + - /opensearch/workload-management/ +--- + +### Overview +Workload management allows users to group search traffic and isolate system resources, preventing resource hogging by specific requests. This ensures fair resource allocation, even for short-lived, low-intensity queries. +The feature support tenant level admission control and reactively picking and cancelling resource intensive queries on configured resource threshold breach. +This feature provides tenant-level isolation within the cluster for search workloads, operating at a node level. + +Admins can dynamically manage these QueryGroups (create, update, delete) via REST APIs. User can use a query group id to make search request, currently we are supporting this value as an HTTP Header called `queryGroupId`. + +### Feature Operating Modes +Query group mode determines the operating level of the feature and it has the following operating modes. +- **Disabled mode** -- It means the feature will not work at all. +- **Enabled mode** -- It means the feature is enabled and will cause cancellations and rejection once the query group's configured thresholds are breached. +- **Monitor_only mode**(Default) -- It means the feature will run and monitor the tasks but it will not cancel/reject the queries . + +These modes can be controlled and changed using `_cluster/settings` endpoint with `wlm.query_group.mode` setting. + +### Workload management settings +There are following settings which dictates the wlm feature behavior. + +| **Setting Name** | **Description** | +|:------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `wlm.query_group.duress_streak` | This setting is used to determine the node duress threshold breaches consecutively to mark the node duress | +| `wlm.query_group.enforcement_interval` | This setting defines the monitoring interval for the feature | +| `wlm.query_group.mode` | defines the feature operating mode | +| `wlm.query_group.node.memory_rejection_threshold` | defines the value with which query group level **memory** threshold be normalised to decide whether to reject new incoming requests or not | +| `wlm.query_group.node.cpu_rejection_threshold` | defines the value with which query group level **cpu** threshold be normalised to decide whether to reject new incoming requests or not | +| `wlm.query_group.node.memory_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for **memory** resource type for WLM feature 2. Determine the query group level effective cancellation threshold | +| `wlm.query_group.node.cpu_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for **cpu** resource type for WLM feature 2. Determine the query group level effective cancellation threshold | + +All of these settings can be updated using `_cluster/settings` api. One more thing to be aware of regarding rejection/cancellation settings is that the rejection thresholds for a resource should always be less than the cancellation thresholds. +Because we want to give some extra headroom for running requests to complete. + +### Workload Management Stats +The stats API is useful to gather current wlm metrices at query group level. The stats API looks like following + +```json +GET _wlm/stats +``` + +#### Example response body +```json +{ + "_nodes": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "cluster_name": "XXXXXXYYYYYYYY", + "A3L9EfBIQf2anrrUhh_goA": { + "query_groups": { + "16YGxFlPRdqIO7K4EACJlw": { + "total_completions": 33570, + "total_rejections": 0, + "total_cancellations": 0, + "cpu": { + "current_usage": 0.03319935314357281, + "cancellations": 0, + "rejections": 0 + }, + "memory": { + "current_usage": 0.002306486276211217, + "cancellations": 0, + "rejections": 0 + } + }, + "DEFAULT_QUERY_GROUP": { + "total_completions": 42572, + "total_rejections": 0, + "total_cancellations": 0, + "cpu": { + "current_usage": 0, + "cancellations": 0, + "rejections": 0 + }, + "memory": { + "current_usage": 0, + "cancellations": 0, + "rejections": 0 + } + } + } + } +} +``` +{% include copy-curl.html %} + +#### Response body fields definitions +| field_name | description | +|:----------------------|:-------------------------------------------------------------------------------------------------------------------------| +| `total_completions` | total request completions in this query_group at the given node. this includes the shard and co-ordinator level requests | +| `total_rejections` | total rejections for the given query_group at the given node. this includes the shard and co-ordinator level requests | +| `total_cancellations` | total cancellations for the given query_group at the given node. this includes the shard and co-ordinator level requests | +| `cpu` | **cpu** resource type stats for the query_group | +| `memory` | **memory** resource type stats for the query_group | + +#### Resource type stats +| field_name | description | +|:----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `current_usage` | Resource usage for the given query group at the given node based on last run of the monitoring thread. This value is updated every `wlm.query_group.enforcement_interval` milliseconds | +| `cancellations` | Cancellation count due to this resource cancellation threshold breach | +| `rejections` | Rejection count due to this resource cancellation threshold breach | + + + From 889a7132736859a807793c1a04abc2ca80baf889 Mon Sep 17 00:00:00 2001 From: Kaushal Kumar Date: Tue, 29 Oct 2024 16:55:45 -0700 Subject: [PATCH 02/17] address automated comments Signed-off-by: Kaushal Kumar --- .../wlm-feature-overview.md | 49 +++++++++++++------ 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 09990cfccb..17b7763711 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -10,12 +10,29 @@ redirect_from: ### Overview Workload management allows users to group search traffic and isolate system resources, preventing resource hogging by specific requests. This ensures fair resource allocation, even for short-lived, low-intensity queries. -The feature support tenant level admission control and reactively picking and cancelling resource intensive queries on configured resource threshold breach. +This feature offers tenant-level admission control and reactive query management. It can identify and cancel resource-intensive queries when configured thresholds are exceeded, ensuring fair resource allocation. This feature provides tenant-level isolation within the cluster for search workloads, operating at a node level. -Admins can dynamically manage these QueryGroups (create, update, delete) via REST APIs. User can use a query group id to make search request, currently we are supporting this value as an HTTP Header called `queryGroupId`. +Admins can dynamically manage these QueryGroups (create, update, delete) using REST APIs. User can use a query group id to make search request, currently we are supporting this value as an HTTP Header called `queryGroupId`. -### Feature Operating Modes +### QueryGroup +This construct enables us to define the groups/tenants. It has the following schema + +```json +{ + "_id" : "16YGxFlPRdqIO7K4EACJlw", + "name" : "ping", + "resiliency_mode" : "soft", + "resource_limits" : { + "cpu" : 0.3, + "memory": 0.2 + }, + "updated_at" : 1729814077916 +} +``` +Admins can dynamically manage these QueryGroups (create, update, delete) using REST APIs. + +### Feature operating modes Query group mode determines the operating level of the feature and it has the following operating modes. - **Disabled mode** -- It means the feature will not work at all. - **Enabled mode** -- It means the feature is enabled and will cause cancellations and rejection once the query group's configured thresholds are breached. @@ -24,23 +41,23 @@ Query group mode determines the operating level of the feature and it has the fo These modes can be controlled and changed using `_cluster/settings` endpoint with `wlm.query_group.mode` setting. ### Workload management settings -There are following settings which dictates the wlm feature behavior. - -| **Setting Name** | **Description** | -|:------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `wlm.query_group.duress_streak` | This setting is used to determine the node duress threshold breaches consecutively to mark the node duress | -| `wlm.query_group.enforcement_interval` | This setting defines the monitoring interval for the feature | -| `wlm.query_group.mode` | defines the feature operating mode | -| `wlm.query_group.node.memory_rejection_threshold` | defines the value with which query group level **memory** threshold be normalised to decide whether to reject new incoming requests or not | -| `wlm.query_group.node.cpu_rejection_threshold` | defines the value with which query group level **cpu** threshold be normalised to decide whether to reject new incoming requests or not | -| `wlm.query_group.node.memory_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for **memory** resource type for WLM feature 2. Determine the query group level effective cancellation threshold | -| `wlm.query_group.node.cpu_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for **cpu** resource type for WLM feature 2. Determine the query group level effective cancellation threshold | +There are following settings which dictates the workload management feature behavior. + +| **setting name** | **description** | +|:-----------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `wlm.query_group.duress_streak` | This setting is used to determine the node duress threshold breaches consecutively to mark the node duress | +| `wlm.query_group.enforcement_interval` | This setting defines the monitoring interval for the feature | +| `wlm.query_group.mode` | defines the feature operating mode | +| `wlm.query_group.node.memory_rejection_threshold` | defines the value with which query group level `memory` threshold be normalised to decide whether to reject new incoming requests or not | +| `wlm.query_group.node.cpu_rejection_threshold` | defines the value with which query group level `cpu` threshold be normalised to decide whether to reject new incoming requests or not | +| `wlm.query_group.node.memory_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for `memory` resource type for WLM feature 2. Determine the query group level effective cancellation threshold | +| `wlm.query_group.node.cpu_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for `cpu` resource type for WLM feature 2. Determine the query group level effective cancellation threshold | All of these settings can be updated using `_cluster/settings` api. One more thing to be aware of regarding rejection/cancellation settings is that the rejection thresholds for a resource should always be less than the cancellation thresholds. Because we want to give some extra headroom for running requests to complete. -### Workload Management Stats -The stats API is useful to gather current wlm metrices at query group level. The stats API looks like following +### Workload management stats +The stats API is useful to gather current workload management metrices at query group level. The stats API looks like following ```json GET _wlm/stats From e2fb60df1776718446ab7f1eea1b2677ed6603fb Mon Sep 17 00:00:00 2001 From: Kaushal Kumar Date: Tue, 29 Oct 2024 17:11:00 -0700 Subject: [PATCH 03/17] address automated comments Signed-off-by: Kaushal Kumar --- .../workload-management/wlm-feature-overview.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 17b7763711..a7fd6b4554 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -111,16 +111,16 @@ GET _wlm/stats {% include copy-curl.html %} #### Response body fields definitions -| field_name | description | +| Field name | Description | |:----------------------|:-------------------------------------------------------------------------------------------------------------------------| | `total_completions` | total request completions in this query_group at the given node. this includes the shard and co-ordinator level requests | | `total_rejections` | total rejections for the given query_group at the given node. this includes the shard and co-ordinator level requests | | `total_cancellations` | total cancellations for the given query_group at the given node. this includes the shard and co-ordinator level requests | -| `cpu` | **cpu** resource type stats for the query_group | -| `memory` | **memory** resource type stats for the query_group | +| `cpu` | `cpu` resource type stats for the query_group | +| `memory` | `memory` resource type stats for the query_group | #### Resource type stats -| field_name | description | +| Field name | Description | |:----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `current_usage` | Resource usage for the given query group at the given node based on last run of the monitoring thread. This value is updated every `wlm.query_group.enforcement_interval` milliseconds | | `cancellations` | Cancellation count due to this resource cancellation threshold breach | From b4ec8af01e214634b32e4608a3def13c4efa9dac Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 7 Nov 2024 16:43:21 -0600 Subject: [PATCH 04/17] Recommit intial changes Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../wlm-feature-overview.md | 195 ++++++++++-------- 1 file changed, 106 insertions(+), 89 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index a7fd6b4554..c904817760 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -1,107 +1,124 @@ --- layout: default -title: Workload Management -nav_order: 62 -has_children: false -parent: workload-management -redirect_from: - - /opensearch/workload-management/ +title: Workload management +nav_order: 70 +has_children: true +parent: Availability and recovery --- -### Overview -Workload management allows users to group search traffic and isolate system resources, preventing resource hogging by specific requests. This ensures fair resource allocation, even for short-lived, low-intensity queries. -This feature offers tenant-level admission control and reactive query management. It can identify and cancel resource-intensive queries when configured thresholds are exceeded, ensuring fair resource allocation. -This feature provides tenant-level isolation within the cluster for search workloads, operating at a node level. +# Workload management -Admins can dynamically manage these QueryGroups (create, update, delete) using REST APIs. User can use a query group id to make search request, currently we are supporting this value as an HTTP Header called `queryGroupId`. +Workload management allows users to group and search network traffic, isolating system resources to prevent the overuse of network resources by specific requests. It offers the following benefits: -### QueryGroup -This construct enables us to define the groups/tenants. It has the following schema +- Tenant-level admission control and reactive query management. It can identify and cancel resource-intensive queries when the configured thresholds are exceeded, ensuring fair resource allocation. + +- Tenant-level isolation within the cluster for search workloads, operating at a node level. + +## Query groups + +System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. + +The following example adds a query group with the named `analytics`: ```json + +```json + +PUT _wlm/query_group + { - "_id" : "16YGxFlPRdqIO7K4EACJlw", - "name" : "ping", - "resiliency_mode" : "soft", - "resource_limits" : { - "cpu" : 0.3, - "memory": 0.2 - }, - "updated_at" : 1729814077916 + + “name”: “analytics”, + + “resiliency_mode”: “enforced”, + + “resource_limits”: { + + “cpu”: 0.4, + + “memory”: 0.2 + + } + } + ``` -Admins can dynamically manage these QueryGroups (create, update, delete) using REST APIs. -### Feature operating modes -Query group mode determines the operating level of the feature and it has the following operating modes. -- **Disabled mode** -- It means the feature will not work at all. -- **Enabled mode** -- It means the feature is enabled and will cause cancellations and rejection once the query group's configured thresholds are breached. -- **Monitor_only mode**(Default) -- It means the feature will run and monitor the tasks but it will not cancel/reject the queries . +## Workload management settings + +There are following settings can be used to customize workload management using the `_cluster/settings` API: -These modes can be controlled and changed using `_cluster/settings` endpoint with `wlm.query_group.mode` setting. +| **Setting name** | **Description** | -### Workload management settings -There are following settings which dictates the workload management feature behavior. +| :--- | :--- | +| `wlm.query_group.duress_streak` | Determines the node duress threshold. Once the threshold is reached, the node is marked as `in duress`. | +| `wlm.query_group.enforcement_interval` | Defines the monitoring interval. | +| `wlm.query_group.mode` | Defines the [operating mode](#operating-modes). | +| `wlm.query_group.node.memory_rejection_threshold` | Defines the query group level `memory` threshold. When the threshold is reached, the request is rejected. | +| `wlm.query_group.node.cpu_rejection_threshold` | Defines query group level `cpu` threshold. When the threshold is reached, the request is rejected. | +| `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold based on `memory` usage. | +| `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold on `cpu` usage. | -| **setting name** | **description** | -|:-----------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `wlm.query_group.duress_streak` | This setting is used to determine the node duress threshold breaches consecutively to mark the node duress | -| `wlm.query_group.enforcement_interval` | This setting defines the monitoring interval for the feature | -| `wlm.query_group.mode` | defines the feature operating mode | -| `wlm.query_group.node.memory_rejection_threshold` | defines the value with which query group level `memory` threshold be normalised to decide whether to reject new incoming requests or not | -| `wlm.query_group.node.cpu_rejection_threshold` | defines the value with which query group level `cpu` threshold be normalised to decide whether to reject new incoming requests or not | -| `wlm.query_group.node.memory_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for `memory` resource type for WLM feature 2. Determine the query group level effective cancellation threshold | -| `wlm.query_group.node.cpu_cancellation_threshold` | this value controls two things 1. Whether the node is in duress for `cpu` resource type for WLM feature 2. Determine the query group level effective cancellation threshold | +When setting rejection and cancellation settings thresholds, remember that the rejection threshold for a resource should always be less than the cancellation threshold. -All of these settings can be updated using `_cluster/settings` api. One more thing to be aware of regarding rejection/cancellation settings is that the rejection thresholds for a resource should always be less than the cancellation thresholds. -Because we want to give some extra headroom for running requests to complete. +### Operating modes -### Workload management stats -The stats API is useful to gather current workload management metrices at query group level. The stats API looks like following +The following operating modes determine the operating-level for the query group: + +- **Disabled mode**: Workload management is disabled. + +- **Enabled mode**: Workload management is enabled and will cause cancellations and rejection once the query group’s configured thresholds are reached. + +- **Monitor_only mode** (Default): Workload management will monitor tasks but it will not cancel/reject any queries. + +## Workload management stats API + +The Workload management stats API returns workload management metrics for a query group, using the following method: ```json GET _wlm/stats ``` -#### Example response body +### Example response + ```json { - "_nodes": { - "total": 1, - "successful": 1, - "failed": 0 + “_nodes”: { + “total”: 1, + “successful”: 1, + “failed”: 0 }, - "cluster_name": "XXXXXXYYYYYYYY", - "A3L9EfBIQf2anrrUhh_goA": { - "query_groups": { - "16YGxFlPRdqIO7K4EACJlw": { - "total_completions": 33570, - "total_rejections": 0, - "total_cancellations": 0, - "cpu": { - "current_usage": 0.03319935314357281, - "cancellations": 0, - "rejections": 0 + “cluster_name”: “XXXXXXYYYYYYYY”, + “A3L9EfBIQf2anrrUhh_goA”: { + “query_groups”: { + “16YGxFlPRdqIO7K4EACJlw”: { + “total_completions”: 33570, + “total_rejections”: 0, + “total_cancellations”: 0, + “cpu”: { + “current_usage”: 0.03319935314357281, + “cancellations”: 0, + “rejections”: 0 }, - "memory": { - "current_usage": 0.002306486276211217, - "cancellations": 0, - "rejections": 0 + “memory”: { + “current_usage”: 0.002306486276211217, + “cancellations”: 0, + “rejections”: 0 } }, - "DEFAULT_QUERY_GROUP": { - "total_completions": 42572, - "total_rejections": 0, - "total_cancellations": 0, - "cpu": { - "current_usage": 0, - "cancellations": 0, - "rejections": 0 + “DEFAULT_QUERY_GROUP”: { + “total_completions”: 42572, + “total_rejections”: 0, + “total_cancellations”: 0, + “cpu”: { + “current_usage”: 0, + “cancellations”: 0, + “rejections”: 0 }, - "memory": { - "current_usage": 0, - "cancellations": 0, - "rejections": 0 + “memory”: { + “current_usage”: 0, + “cancellations”: 0, + “rejections”: 0 } } } @@ -110,21 +127,21 @@ GET _wlm/stats ``` {% include copy-curl.html %} -#### Response body fields definitions -| Field name | Description | -|:----------------------|:-------------------------------------------------------------------------------------------------------------------------| -| `total_completions` | total request completions in this query_group at the given node. this includes the shard and co-ordinator level requests | -| `total_rejections` | total rejections for the given query_group at the given node. this includes the shard and co-ordinator level requests | -| `total_cancellations` | total cancellations for the given query_group at the given node. this includes the shard and co-ordinator level requests | -| `cpu` | `cpu` resource type stats for the query_group | -| `memory` | `memory` resource type stats for the query_group | +### Response body fields -#### Resource type stats -| Field name | Description | -|:----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `current_usage` | Resource usage for the given query group at the given node based on last run of the monitoring thread. This value is updated every `wlm.query_group.enforcement_interval` milliseconds | -| `cancellations` | Cancellation count due to this resource cancellation threshold breach | -| `rejections` | Rejection count due to this resource cancellation threshold breach | +| Field name | Description | +|:----|:--- | +| `total_completions` | The total number of request completions in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_rejections` | The total number request rejections in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_cancellations` | The total number of cancellations in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `cpu` | The `cpu` resource type stats for the `query_group` | +| `memory` | The `memory` resource type stats for the `query_group` | +### Resource type stats +| Field name | Description | +| :--- | :---- | +| `current_usage` |The resource usage for `query_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.query_group.enforcement_interval`. | +| `cancellations` | The cancellation count as a result of the cancellation threshold being reached. | +| `rejections` | The rejection count as a result of the cancellation threshold being reached. | From 94e858b5a67be6ff6b53b1d34124054204754934 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 8 Nov 2024 13:56:05 -0600 Subject: [PATCH 05/17] Update wlm-feature-overview.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index c904817760..dfd95427f9 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -21,27 +21,15 @@ System administrators can dynamically manage query groups using the Workload man The following example adds a query group with the named `analytics`: ```json - -```json - PUT _wlm/query_group - { - “name”: “analytics”, - “resiliency_mode”: “enforced”, - “resource_limits”: { - “cpu”: 0.4, - “memory”: 0.2 - } - } - ``` ## Workload management settings From e81cb629d29f8423c12e3c1f579d8043fcba56e3 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 13:47:53 -0600 Subject: [PATCH 06/17] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index dfd95427f9..ae1ad5c686 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -43,7 +43,7 @@ There are following settings can be used to customize workload management using | `wlm.query_group.enforcement_interval` | Defines the monitoring interval. | | `wlm.query_group.mode` | Defines the [operating mode](#operating-modes). | | `wlm.query_group.node.memory_rejection_threshold` | Defines the query group level `memory` threshold. When the threshold is reached, the request is rejected. | -| `wlm.query_group.node.cpu_rejection_threshold` | Defines query group level `cpu` threshold. When the threshold is reached, the request is rejected. | +| `wlm.query_group.node.cpu_rejection_threshold` | Defines query group level `cpu` threshold. When the threshold is reached, the request is rejected. | | `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold based on `memory` usage. | | `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold on `cpu` usage. | From 00a9fe7229aaf1680aae750136345c9566e6a8eb Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:04:09 -0600 Subject: [PATCH 07/17] Update wlm-feature-overview.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../wlm-feature-overview.md | 74 ++++++++++++++++++- 1 file changed, 71 insertions(+), 3 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index ae1ad5c686..2fc8cc95d9 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -6,6 +6,9 @@ has_children: true parent: Availability and recovery --- +Introduced 2.18 +{: .label .label-purple } + # Workload management Workload management allows users to group and search network traffic, isolating system resources to prevent the overuse of network resources by specific requests. It offers the following benefits: @@ -14,9 +17,34 @@ Workload management allows users to group and search network traffic, isolating - Tenant-level isolation within the cluster for search workloads, operating at a node level. +## Installing workload management + +To install workload management, use the following command: + +```json +./bin/opensearch-plugin install workload-management +``` +{% include copy-curl.html %} + +## Permissions + +Only users with administator-lelve permissions can use workload management. + ## Query groups -System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. +A _query group_ is a logical groups of tasks with defineded resource limits. System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. + +### Operating modes + +The following operating modes determine the operating-level for the query group: + +- **Disabled mode**: Workload management is disabled. + +- **Enabled mode**: Workload management is enabled and will cause cancellations and rejection once the query group’s configured thresholds are reached. + +- **Monitor_only mode** (Default): Workload management will monitor tasks but it will not cancel/reject any queries. + +### Example request The following example adds a query group with the named `analytics`: @@ -31,13 +59,52 @@ PUT _wlm/query_group } } ``` +{% include copy-curl.html %} + +When creating a query group, make sure that the sum of the resource limits for a single reousrce, such as `cpu` or `memory`, does not exceed `1`. + +### Example response + +OpenSearch responds with the set resource limits and the `_id` for the query group: + +```json +{ + "_id":"preXpc67RbKKeCyka72_Gw", + "name":"analytics", + "resiliency_mode":"enforced", + "resource_limits":{ + "cpu":0.4, + "memory":0.2 + }, + "updated_at":1726270184642 +} +``` + +## Using `queryGroupID` + +To ensure that resources when querying are properly managed and allocated to the limits defined by the query group, you can accociate the request the a `queryGroupID`. This ID helps route and track requests under the context of the query group, so that resource quoras and task limits are enforced. + +The following example query uses the `queryGroupId` to ensure that the query stays under that resource groups limits: + +```json +{ + "_id":"preXpc67RbKKeCyka72_Gw", + "name":"analytics", + "resiliency_mode":"enforced", + "resource_limits":{ + "cpu":0.4, + "memory":0.2 + }, + "updated_at":1726270184642 +} +``` +{% include copy-curl.html %} ## Workload management settings -There are following settings can be used to customize workload management using the `_cluster/settings` API: +The are following settings can be used to customize workload management using the `_cluster/settings` API: | **Setting name** | **Description** | - | :--- | :--- | | `wlm.query_group.duress_streak` | Determines the node duress threshold. Once the threshold is reached, the node is marked as `in duress`. | | `wlm.query_group.enforcement_interval` | Defines the monitoring interval. | @@ -66,6 +133,7 @@ The Workload management stats API returns workload management metrics for a quer ```json GET _wlm/stats ``` +{% include copy-curl.html %} ### Example response From c13ef82f0e45b1b901ee38e3ae07da19be776ec8 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:07:47 -0600 Subject: [PATCH 08/17] Grammar and typo fixes Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 2fc8cc95d9..0a08abc83e 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -13,7 +13,7 @@ Introduced 2.18 Workload management allows users to group and search network traffic, isolating system resources to prevent the overuse of network resources by specific requests. It offers the following benefits: -- Tenant-level admission control and reactive query management. It can identify and cancel resource-intensive queries when the configured thresholds are exceeded, ensuring fair resource allocation. +- Tenant-level admission control and reactive query management. When resource usage exceeds configured limits, it automatically identifies and cancels demanding queries, ensuring fair resource distribution. - Tenant-level isolation within the cluster for search workloads, operating at a node level. @@ -28,11 +28,11 @@ To install workload management, use the following command: ## Permissions -Only users with administator-lelve permissions can use workload management. +Only users with administator-level permissions can use workload management. ## Query groups -A _query group_ is a logical groups of tasks with defineded resource limits. System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. +A _query group_ is a logical group of tasks with defined resource limits. System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. ### Operating modes @@ -61,7 +61,7 @@ PUT _wlm/query_group ``` {% include copy-curl.html %} -When creating a query group, make sure that the sum of the resource limits for a single reousrce, such as `cpu` or `memory`, does not exceed `1`. +When creating a query group, make sure that the sum of the resource limits for a single resource, such as `cpu` or `memory`, does not exceed `1`. ### Example response @@ -82,9 +82,9 @@ OpenSearch responds with the set resource limits and the `_id` for the query gro ## Using `queryGroupID` -To ensure that resources when querying are properly managed and allocated to the limits defined by the query group, you can accociate the request the a `queryGroupID`. This ID helps route and track requests under the context of the query group, so that resource quoras and task limits are enforced. +You can associate a query request with a `queryGroupID` to manage and allocate resources within the limits defined by the query group. By utilizing this ID, requests are routed and tracked under the query group, ensuring resource quotas and task limits are maintained. -The following example query uses the `queryGroupId` to ensure that the query stays under that resource groups limits: +The following example query uses the `queryGroupId` to ensure that the query stays under that query group's resource limits: ```json { From 739905c604f466db7bf4d571a28399a061defc1b Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:11:02 -0600 Subject: [PATCH 09/17] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 0a08abc83e..b5b550d57f 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -187,7 +187,7 @@ GET _wlm/stats | Field name | Description | |:----|:--- | -| `total_completions` | The total number of request completions in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_completions` | The total number of request completions in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | | `total_rejections` | The total number request rejections in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | | `total_cancellations` | The total number of cancellations in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | | `cpu` | The `cpu` resource type stats for the `query_group` | From de4206b44c59b70a2765bcaa85c72223ad8d5cd4 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:17:47 -0600 Subject: [PATCH 10/17] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index b5b550d57f..94b4838d2e 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -186,7 +186,7 @@ GET _wlm/stats ### Response body fields | Field name | Description | -|:----|:--- | +| :--- | :--- | | `total_completions` | The total number of request completions in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | | `total_rejections` | The total number request rejections in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | | `total_cancellations` | The total number of cancellations in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | From 6636ce8a205b6d4a3132a442fb471faa54887fd1 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:25:48 -0600 Subject: [PATCH 11/17] Update _tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 94b4838d2e..58a2fed9f9 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -11,7 +11,7 @@ Introduced 2.18 # Workload management -Workload management allows users to group and search network traffic, isolating system resources to prevent the overuse of network resources by specific requests. It offers the following benefits: +Workload management allows users to group search traffic and isolate network resources, preventing the overuse of network resources by specific requests. It offers the following benefits: - Tenant-level admission control and reactive query management. When resource usage exceeds configured limits, it automatically identifies and cancels demanding queries, ensuring fair resource distribution. From 1fc9fabab1fef47b1ecc0cccd57adbd3f6320396 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:33:30 -0600 Subject: [PATCH 12/17] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 58a2fed9f9..4075cf1fbc 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -28,7 +28,7 @@ To install workload management, use the following command: ## Permissions -Only users with administator-level permissions can use workload management. +Only users with administrator-level permissions can create and update query groups using the Workload management APIs. ## Query groups From aec9f755b90e8c429f91a62d15ee438a7fbd7d14 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 14:34:02 -0600 Subject: [PATCH 13/17] move permissions section Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 4075cf1fbc..aa0bde3cf5 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -26,14 +26,14 @@ To install workload management, use the following command: ``` {% include copy-curl.html %} -## Permissions - -Only users with administrator-level permissions can create and update query groups using the Workload management APIs. - ## Query groups A _query group_ is a logical group of tasks with defined resource limits. System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. +### Permissions + +Only users with administrator-level permissions can create and update query groups using the Workload management APIs. + ### Operating modes The following operating modes determine the operating-level for the query group: From 3424fbd6137b7ee8664e1c64c08eef9188cdb980 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 12 Nov 2024 17:51:17 -0600 Subject: [PATCH 14/17] Update wlm-feature-overview.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index aa0bde3cf5..8de326b653 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -87,15 +87,16 @@ You can associate a query request with a `queryGroupID` to manage and allocate r The following example query uses the `queryGroupId` to ensure that the query stays under that query group's resource limits: ```json +GET testindex/_search +Host: localhost:9200 +Content-Type: application/json +queryGroupId: preXpc67RbKKeCyka72_Gw { - "_id":"preXpc67RbKKeCyka72_Gw", - "name":"analytics", - "resiliency_mode":"enforced", - "resource_limits":{ - "cpu":0.4, - "memory":0.2 - }, - "updated_at":1726270184642 + "query": { + "match": { + "field_name": "value" + } + } } ``` {% include copy-curl.html %} From cbf9da376b789873dcac3c771b030febf16990c5 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 13 Nov 2024 11:23:47 -0600 Subject: [PATCH 15/17] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../wlm-feature-overview.md | 51 +++++++++---------- 1 file changed, 23 insertions(+), 28 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 8de326b653..fa90b9c298 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -11,11 +11,11 @@ Introduced 2.18 # Workload management -Workload management allows users to group search traffic and isolate network resources, preventing the overuse of network resources by specific requests. It offers the following benefits: +Workload management allows you to group search traffic and isolate network resources, preventing the overuse of network resources by specific requests. It offers the following benefits: - Tenant-level admission control and reactive query management. When resource usage exceeds configured limits, it automatically identifies and cancels demanding queries, ensuring fair resource distribution. -- Tenant-level isolation within the cluster for search workloads, operating at a node level. +- Tenant-level isolation within the cluster for search workloads, operating at the node level. ## Installing workload management @@ -28,25 +28,25 @@ To install workload management, use the following command: ## Query groups -A _query group_ is a logical group of tasks with defined resource limits. System administrators can dynamically manage query groups using the Workload management APIs. These query groups can be used to make search requests with resource limits. +A _query group_ is a logical grouping of tasks with defined resource limits. System administrators can dynamically manage query groups using the Workload Management APIs. These query groups can be used to create search requests with resource limits. ### Permissions -Only users with administrator-level permissions can create and update query groups using the Workload management APIs. +Only users with administrator-level permissions can create and update query groups using the Workload Management APIs. ### Operating modes -The following operating modes determine the operating-level for the query group: +The following operating modes determine the operating level for a query group: - **Disabled mode**: Workload management is disabled. -- **Enabled mode**: Workload management is enabled and will cause cancellations and rejection once the query group’s configured thresholds are reached. +- **Enabled mode**: Workload management is enabled and will cancel and reject queries once the query group's configured thresholds are reached. -- **Monitor_only mode** (Default): Workload management will monitor tasks but it will not cancel/reject any queries. +- **Monitor_only mode** (Default): Workload management will monitor tasks but will not cancel or reject any queries. ### Example request -The following example adds a query group with the named `analytics`: +The following example request adds a query group named `analytics`: ```json PUT _wlm/query_group @@ -84,7 +84,7 @@ OpenSearch responds with the set resource limits and the `_id` for the query gro You can associate a query request with a `queryGroupID` to manage and allocate resources within the limits defined by the query group. By utilizing this ID, requests are routed and tracked under the query group, ensuring resource quotas and task limits are maintained. -The following example query uses the `queryGroupId` to ensure that the query stays under that query group's resource limits: +The following example query uses the `queryGroupId` to ensure that the query does not exceed that query group's resource limits: ```json GET testindex/_search @@ -103,7 +103,7 @@ queryGroupId: preXpc67RbKKeCyka72_Gw ## Workload management settings -The are following settings can be used to customize workload management using the `_cluster/settings` API: +The following settings can be used to customize workload management using the `_cluster/settings` API. | **Setting name** | **Description** | | :--- | :--- | @@ -111,25 +111,20 @@ The are following settings can be used to customize workload management using th | `wlm.query_group.enforcement_interval` | Defines the monitoring interval. | | `wlm.query_group.mode` | Defines the [operating mode](#operating-modes). | | `wlm.query_group.node.memory_rejection_threshold` | Defines the query group level `memory` threshold. When the threshold is reached, the request is rejected. | -| `wlm.query_group.node.cpu_rejection_threshold` | Defines query group level `cpu` threshold. When the threshold is reached, the request is rejected. | +| `wlm.query_group.node.cpu_rejection_threshold` | Defines the query group level `cpu` threshold. When the threshold is reached, the request is rejected. | | `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold based on `memory` usage. | | `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold on `cpu` usage. | -When setting rejection and cancellation settings thresholds, remember that the rejection threshold for a resource should always be less than the cancellation threshold. +When setting rejection and cancellation thresholds, remember that the rejection threshold for a resource should always be lower than the cancellation threshold. -### Operating modes -The following operating modes determine the operating-level for the query group: -- **Disabled mode**: Workload management is disabled. -- **Enabled mode**: Workload management is enabled and will cause cancellations and rejection once the query group’s configured thresholds are reached. -- **Monitor_only mode** (Default): Workload management will monitor tasks but it will not cancel/reject any queries. -## Workload management stats API +## Workload Management Stats API -The Workload management stats API returns workload management metrics for a query group, using the following method: +The Workload Management Stats API returns workload management metrics for a query group, using the following method: ```json GET _wlm/stats @@ -188,17 +183,17 @@ GET _wlm/stats | Field name | Description | | :--- | :--- | -| `total_completions` | The total number of request completions in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | -| `total_rejections` | The total number request rejections in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | -| `total_cancellations` | The total number of cancellations in this `query_group` at the given node. This includes all shard-level and coordinator-level requests. | -| `cpu` | The `cpu` resource type stats for the `query_group` | -| `memory` | The `memory` resource type stats for the `query_group` | +| `total_completions` | The total number of request completions in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_rejections` | The total number request rejections in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `total_cancellations` | The total number of cancellations in the `query_group` at the given node. This includes all shard-level and coordinator-level requests. | +| `cpu` | The `cpu` resource type statistics for the `query_group`. | +| `memory` | The `memory` resource type statistics for the `query_group`. | -### Resource type stats +### Resource type statistics | Field name | Description | | :--- | :---- | -| `current_usage` |The resource usage for `query_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.query_group.enforcement_interval`. | -| `cancellations` | The cancellation count as a result of the cancellation threshold being reached. | -| `rejections` | The rejection count as a result of the cancellation threshold being reached. | +| `current_usage` |The resource usage for the `query_group` at the given node based on the last run of the monitoring thread. This value is updated based on the `wlm.query_group.enforcement_interval`. | +| `cancellations` | The number of cancellations resulting from the cancellation threshold being reached. | +| `rejections` | The number of rejections resulting from the cancellation threshold being reached. | From f4db7787bf23b709c9dd7b698beedafa9f0dc881 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 13 Nov 2024 14:20:37 -0600 Subject: [PATCH 16/17] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index fa90b9c298..1e6feab3ae 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -82,7 +82,7 @@ OpenSearch responds with the set resource limits and the `_id` for the query gro ## Using `queryGroupID` -You can associate a query request with a `queryGroupID` to manage and allocate resources within the limits defined by the query group. By utilizing this ID, requests are routed and tracked under the query group, ensuring resource quotas and task limits are maintained. +You can associate a query request with a `queryGroupID` to manage and allocate resources within the limits defined by the query group. By using this ID, request routing and tracking are associated with the query group, ensuring resource quotas and task limits are maintained. The following example query uses the `queryGroupId` to ensure that the query does not exceed that query group's resource limits: @@ -112,8 +112,8 @@ The following settings can be used to customize workload management using the `_ | `wlm.query_group.mode` | Defines the [operating mode](#operating-modes). | | `wlm.query_group.node.memory_rejection_threshold` | Defines the query group level `memory` threshold. When the threshold is reached, the request is rejected. | | `wlm.query_group.node.cpu_rejection_threshold` | Defines the query group level `cpu` threshold. When the threshold is reached, the request is rejected. | -| `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold based on `memory` usage. | -| `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered in duress when the `cpu` threshold is reached and the effective request cancellation threshold on `cpu` usage. | +| `wlm.query_group.node.memory_cancellation_threshold` | Controls whether the node is considered to be in duress when the `memory` threshold is reached. Requests routed to nodes in duress are canceled. | +| `wlm.query_group.node.cpu_cancellation_threshold` | Controls whether the node is considered to be in duress when the `cpu` threshold is reached. Requests routed to nodes in duress are canceled. | When setting rejection and cancellation thresholds, remember that the rejection threshold for a resource should always be lower than the cancellation threshold. From 1ac0de9c9efabc9a804a496e937d449d565b20c6 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 13 Nov 2024 14:21:09 -0600 Subject: [PATCH 17/17] Update wlm-feature-overview.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../workload-management/wlm-feature-overview.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md index 1e6feab3ae..956a01a774 100644 --- a/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md +++ b/_tuning-your-cluster/availability-and-recovery/workload-management/wlm-feature-overview.md @@ -117,11 +117,6 @@ The following settings can be used to customize workload management using the `_ When setting rejection and cancellation thresholds, remember that the rejection threshold for a resource should always be lower than the cancellation threshold. - - - - - ## Workload Management Stats API The Workload Management Stats API returns workload management metrics for a query group, using the following method: