Skip to content

Commit

Permalink
Backport of Docs/rate limiting 1.15 into release/1.15.x (#16384)
Browse files Browse the repository at this point in the history
* backport of commit 5042d8d

* backport of commit c6b83c4

* backport of commit b12a569

* backport of commit 16d81dd

* backport of commit 48ff8f7

* backport of commit e677bc7

* backport of commit 74924a2

* backport of commit fe9bca7

---------

Co-authored-by: trujillo-adam <[email protected]>
  • Loading branch information
1 parent 7cd0eff commit 926d480
Show file tree
Hide file tree
Showing 8 changed files with 212 additions and 23 deletions.
16 changes: 8 additions & 8 deletions website/content/docs/agent/config/config-files.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -534,17 +534,17 @@ Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'."

- `license_path` <EnterpriseAlert inline /> This specifies the path to a file that contains the Consul Enterprise license. Alternatively the license may also be specified in either the `CONSUL_LICENSE` or `CONSUL_LICENSE_PATH` environment variables. See the [licensing documentation](/consul/docs/enterprise/license/overview) for more information about Consul Enterprise license management. Added in versions 1.10.0, 1.9.7 and 1.8.13. Prior to version 1.10.0 the value may be set for all agents to facilitate forwards compatibility with 1.10 but will only actually be used by client agents.

- `limits` Available in Consul 0.9.3 and later, this is a nested
object that configures limits that are enforced by the agent. Prior to Consul 1.5.2,
this only applied to agents in client mode, not Consul servers. The following parameters
are available:
- `limits`: This block specifies various types of limits that the Consul server agent enforces.

- `http_max_conns_per_client` - Configures a limit of how many concurrent TCP connections a single client IP address is allowed to open to the agent's HTTP(S) server. This affects the HTTP(S) servers in both client and server agents. Default value is `200`.
- `https_handshake_timeout` - Configures the limit for how long the HTTPS server in both client and server agents will wait for a client to complete a TLS handshake. This should be kept conservative as it limits how many connections an unauthenticated attacker can open if `verify_incoming` is being using to authenticate clients (strongly recommended in production). Default value is `5s`.
- `request_limits` - This object povides configuration for rate limiting RPC and gRPC requests on the consul server. As a result of rate limiting gRPC and RPC request, HTTP requests to the Consul server are rate limited.
- `mode` - Configures whether rate limiting is enabled or not as well as how it behaves through the use of 3 possible modes. The default value of "disabled" will prevent any rate limiting from occuring. A value of "permissive" will cause the system to track requests against the `read_rate` and `write_rate` but will only log violations and will not block and will allow the request to continue processing. A value of "enforcing" also tracks requests against the `read_rate` and `write_rate` but in addition to logging violations, the system will block the request from processings by returning an error.
- `read_rate` - Configures how frequently RPC, gRPC, and HTTP queries are allowed to happen. The rate limiter limits the rate to tokens per second equal to this value. See https://en.wikipedia.org/wiki/Token_bucket for more about token buckets.
- `write_rate` - Configures how frequently RPC, gRPC, and HTTP write are allowed to happen. The rate limiter limits the rate to tokens per second equal to this value. See https://en.wikipedia.org/wiki/Token_bucket for more about token buckets.
- `request_limits` - This object specifies configurations that limit the rate of RPC and gRPC requests on the Consul server. Limiting the rate of gRPC and RPC requests also limits HTTP requests to the Consul server.
- `mode` - String value that specifies an action to take if the rate of requests exceeds the limit. You can specify the following values:
- `permissive`: The server continues to allow requests and records an error in the logs.
- `enforcing`: The server stops accepting requests and records an error in the logs.
- `disabled`: Limits are not enforced or tracked. This is the default value for `mode`.
- `read_rate` - Integer value that specifies the number of read requests per second. Default is `100`.
- `write_rate` - Integer value that specifies the number of write requests per second. Default is `100`.
- `rpc_handshake_timeout` - Configures the limit for how long servers will wait after a client TCP connection is established before they complete the connection handshake. When TLS is used, the same timeout applies to the TLS handshake separately from the initial protocol negotiation. All Consul clients should perform this immediately on establishing a new connection. This should be kept conservative as it limits how many connections an unauthenticated attacker can open if `verify_incoming` is being using to authenticate clients (strongly recommended in production). When `verify_incoming` is true on servers, this limits how long the connection socket and associated goroutines will be held open before the client successfully authenticates. Default value is `5s`.
- `rpc_client_timeout` - Configures the limit for how long a client is allowed to read from an RPC connection. This is used to set an upper bound for calls to eventually terminate so that RPC connections are not held indefinitely. Blocking queries can override this timeout. Default is `60s`.
- `rpc_max_conns_per_client` - Configures a limit of how many concurrent TCP connections a single source IP address is allowed to open to a single server. It affects both clients connections and other server connections. In general Consul clients multiplex many RPC calls over a single TCP connection so this can typically be kept low. It needs to be more than one though since servers open at least one additional connection for raft RPC, possibly more for WAN federation when using network areas, and snapshot requests from clients run over a separate TCP conn. A reasonably low limit significantly reduces the ability of an unauthenticated attacker to consume unbounded resources by holding open many connections. You may need to increase this if WAN federated servers connect via proxies or NAT gateways or similar causing many legitimate connections from a single source IP. Default value is `100` which is designed to be extremely conservative to limit issues with certain deployment patterns. Most deployments can probably reduce this safely. 100 connections on modern server hardware should not cause a significant impact on resource usage from an unauthenticated attacker though.
Expand Down
2 changes: 1 addition & 1 deletion website/content/docs/agent/config/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ The following agent configuration options are reloadable at runtime:
- These can be important in certain outage situations so being able to control
them without a restart provides a recovery path that doesn't involve
downtime. They generally shouldn't be changed otherwise.
- [RPC rate limiting](/consul/docs/agent/config/config-files#limits)
- [RPC rate limits](/consul/docs/agent/config/config-files#limits)
- [HTTP Maximum Connections per Client](/consul/docs/agent/config/config-files#http_max_conns_per_client)
- Services
- TLS Configuration
Expand Down
27 changes: 15 additions & 12 deletions website/content/docs/agent/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The following process describes the agent lifecycle within the context of an exi
As a result, all nodes will eventually become aware of each other.
1. **Existing servers will begin replicating to the new node** if the agent is a server.

### Failures and Crashes
### Failures and crashes

In the event of a network failure, some nodes may be unable to reach other nodes.
Unreachable nodes will be marked as _failed_.
Expand All @@ -48,7 +48,7 @@ catalog.
Once the network recovers or a crashed agent restarts, the cluster will repair itself and unmark a node as failed.
The health check in the catalog will also be updated to reflect the current state.

### Exiting Nodes
### Exiting nodes

When a node leaves a cluster, it communicates its intent and the cluster marks the node as having _left_.
In contrast to changes related to failures, all of the services provided by a node are immediately deregistered.
Expand All @@ -61,6 +61,9 @@ interval of 72 hours (changing the reap interval is _not_ recommended due to
its consequences during outage situations). Reaping is similar to leaving,
causing all associated services to be deregistered.

## Limit traffic rates
You can define a set of rate limiting configurations that help operators protect Consul servers from excessive or peak usage. The configurations enable you to gracefully degrade Consul servers to avoid a global interruption of service. You can allocate a set of resources to different Consul users and eliminate the risks that some users consuming too many resources pose to others. Consul supports global server rate limiting, which lets configure Consul servers to deny requests that exceed the read or write limits. Refer to [Traffic Rate Limits Overview](/consul/docs/agent/limits/limit-traffic-rates).

## Requirements

You should run one Consul agent per server or host.
Expand All @@ -73,7 +76,7 @@ Refer to the following sections for information about host, port, memory, and ot

The [Datacenter Deploy tutorial](/consul/tutorials/production-deploy/reference-architecture#deployment-system-requirements) contains additional information, including licensing configuration, environment variables, and other details.

### Maximum Latency Network requirements
### Maximum latency network requirements

Consul uses the gossip protocol to share information across agents. To function properly, you cannot exceed the protocol's maximum latency threshold. The latency threshold is calculated according to the total round trip time (RTT) for communication between all agents. Other network usages outside of Gossip are not bound by these latency requirements (i.e. client to server RPCs, HTTP API requests, xDS proxy configuration, DNS).

Expand All @@ -82,7 +85,7 @@ For data sent between all Consul agents the following latency requirements must
- Average RTT for all traffic cannot exceed 50ms.
- RTT for 99 percent of traffic cannot exceed 100ms.

## Starting the Consul Agent
## Starting the Consul agent

Start a Consul agent with the `consul` command and `agent` subcommand using the following syntax:

Expand Down Expand Up @@ -111,7 +114,7 @@ $ consul agent -data-dir=tmp/consul -dev

Agents are highly configurable, which enables you to deploy Consul to any infrastructure. Many of the default options for the `agent` command are suitable for becoming familiar with a local instance of Consul. In practice, however, several additional configuration options must be specified for Consul to function as expected. Refer to [Agent Configuration](/consul/docs/agent/config) topic for a complete list of configuration options.

### Understanding the Agent Startup Output
### Understanding the agent startup output

Consul prints several important messages on startup.
The following example shows output from the [`consul agent`](/consul/commands/agent) command:
Expand Down Expand Up @@ -162,7 +165,7 @@ When running under `systemd` on Linux, Consul notifies systemd by sending
this either the `join` or `retry_join` option has to be set and the
service definition file has to have `Type=notify` set.

## Configuring Consul Agents
## Configuring Consul agents

You can specify many options to configure how Consul operates when issuing the `consul agent` command.
You can also create one or more configuration files and provide them to Consul at startup using either the `-config-file` or `-config-dir` option.
Expand All @@ -180,7 +183,7 @@ $ consul agent -config-file=server.json
The configuration options necessary to successfully use Consul depend on several factors, including the type of agent you are configuring (client or server), the type of environment you are deploying to (e.g., on-premise, multi-cloud, etc.), and the security options you want to implement (ACLs, gRPC encryption).
The following examples are intended to help you understand some of the combinations you can implement to configure Consul.

### Common Configuration Settings
### Common configuration settings

The following settings are commonly used in the configuration file (also called a service definition file when registering services with Consul) to configure Consul agents:

Expand All @@ -195,7 +198,7 @@ The following settings are commonly used in the configuration file (also called
| `addresses` | Block of nested objects that define addresses bound to the agent for internal cluster communication. | `"http": "0.0.0.0"` See the Agent Configuration page for [default address values](/consul/docs/agent/config/config-files#addresses) |
| `ports` | Block of nested objects that define ports bound to agent addresses. <br/>See (link to addresses option) for details. | See the Agent Configuration page for [default port values](/consul/docs/agent/config/config-files#ports) |

### Server Node in a Service Mesh
### Server node in a service mesh

The following example configuration is for a server agent named "`consul-server`". The server is [bootstrapped](/consul/docs/agent/config/cli-flags#_bootstrap) and the Consul GUI is enabled.
The reason this server agent is configured for a service mesh is that the `connect` configuration is enabled. Connect is Consul's service mesh component that provides service-to-service connection authorization and encryption using mutual Transport Layer Security (TLS). Applications can use sidecar proxies in a service mesh configuration to establish TLS connections for inbound and outbound connections without being aware of Connect at all. See [Connect](/consul/docs/connect) for details.
Expand Down Expand Up @@ -243,7 +246,7 @@ connect {

</CodeTabs>

### Server Node with Encryption Enabled
### Server node with encryption enabled

The following example shows a server node configured with encryption enabled.
Refer to the [Security](/consul/docs/security) chapter for additional information about how to configure security options for Consul.
Expand Down Expand Up @@ -313,7 +316,7 @@ tls {

</CodeTabs>

### Client Node Registering a Service
### Client node registering a service

Using Consul as a central service registry is a common use case.
The following example configuration includes common settings to register a service with a Consul agent and enable health checks (see [Checks](/consul/docs/discovery/checks) to learn more about health checks):
Expand Down Expand Up @@ -371,7 +374,7 @@ service {

</CodeTabs>

## Client Node with Multiple Interfaces or IP addresses
## Client node with multiple interfaces or IP addresses

The following example shows how to configure Consul to listen on multiple interfaces or IP addresses using a [go-sockaddr template].

Expand Down Expand Up @@ -422,7 +425,7 @@ advertise_addr = "{{ GetInterfaceIP \"en0\" }}"

</CodeTabs>

## Stopping an Agent
## Stopping an agent

An agent can be stopped in two ways: gracefully or forcefully. Servers and
Clients both behave differently depending on the leave that is performed. There
Expand Down
30 changes: 30 additions & 0 deletions website/content/docs/agent/limits/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
layout: docs
page_title: Limit Traffic Rates Overview
description: Rate limiting is a set of Consul server agent configurations that you can use to mitigate the risks to Consul servers when clients send excessive requests to Consul resources.

---

# Limit Traffic Rates Overview
This topic provides overview information about the traffic rates limits you can configure for Consul servers.

## Introduction
You can configure global RPC rate limits to mitigate the risks to Consul servers when clients send excessive read or write requests to Consul resources. A read request is defined as any request that does not modify Consul internal state. A write request is defined as any request that modifies Consul internal state. Read and write requests are limited separately.

## Rate limit modes
You can set one of the following modes, which determine how Consul servers react when the request limits are exceeded.

- **Enforcing mode**: In this mode, the rate limiter denies requests to a server beyond a configurable rate. Consul generates metrics and logs to help operators understand their Consul load and configure limits accordingly.
- **Permissive mode**: The rate limiter allows requests if the limits are reached and produces metrics and logs to help operators understand their Consul load and configure limits accordingly. This mode is intended to help you configure limits and debug specific issues.
- **Disabled mode**: Disables the rate limiter. All requests are allowed and no logs or metrics are produced. This is the default mode.

Refer to [`rate_limits`](/consul/docs/agent/config/config-files#request_limits) for additional configuration information.

## Request denials
When an HTTP request is denied for rate limiting reason, Consul returns one of the following errors:

- **429 Resource Exhausted**: Indicates that a server is not able to perform the request but that another server could potentially fulfill it. This error is most common on stale reads because any server may fulfill state read requests. To resolve this type of error, we recommend immediately retrying the request to another server. If the request came from a Consul client agent, the agent automatically retries the request up to the limit set in the [`rpc_hold_timeout`](/consul/docs/agent/config/config-files#rpc_hold_timeout) configuration .

- **503 Service Unavailable**: Indicates that server is unable to perform the request and that no other server can fulfill the request, either. This usually occurs on consistent reads or for writes. In this case we recommend retrying according to an exponential backoff schedule. If the request came from a Consul client agent, the agent automatically retries the request according to the [`rpc_hold_timeout`](/consul/docs/agent/config/config-files#rpc_hold_timeout) configuration.

Refer to [Rate limit reached on the server](/consul/docs/troubleshoot/common-errors#rate-limit-reached-on-the-server) for additional information.
32 changes: 32 additions & 0 deletions website/content/docs/agent/limits/init-rate-limits.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
layout: docs
page_title: Initialize Rate Limit Settings
description: Learn how to determins regular and peak loads in your network so that you can set the initial global rate limit configurations.
---

# Initialize Rate Limit Settings

In order to set limits for traffic, you must first understand regular and peak loads in your network. We recommend completing the following steps to benchmark request rates in your environment so that you can implement limits appropriate for your applications.

1. Specify a global rate limit with arbitrary values in the agent configuration file based on the following conditions:

- Environment where Consul servers are running
- Number of servers and the projected load
- Existing metrics expressing requests per second

1. Set the `mode` to `permissive`. In the following example, Consul agents are allowed up to 1000 reads and 500 writes per second:

```hcl
request_limits {
mode = "permissive"
read_rate = 1000.0
write_rate =500.0
}
```

1. Observe the logs and metrics for your application's typical cycle, such as a 24 hour period. Refer to [`log_file`](/consul/docs/agent/config/config-files#log_file) for information about where to retrieve logs. Call the [`/agent/metrics`](/consul/api-docs/agent#view-metrics) HTTP API endpoint and check the data for the following metrics:

- `rpc.rate_limit.exceeded.read`
- `rpc.rate_limit.exceeded.write`

1. If the limits are not reached, set the `mode` configuration to `enforcing`. Otherwise adjust and iterate limits.
Loading

0 comments on commit 926d480

Please sign in to comment.