Skip to content

Commit

Permalink
receive: use async remote writing
Browse files Browse the repository at this point in the history
Instead of spawning new goroutines for each peer that we want to remote
write to, spawn a fixed number of worker goroutines and then schedule
work on top of them.

This has reduced the number of goroutines in our case about 10x-20x and
the 99p of forwarding dropped from ~30s to just a few hundred
milliseconds.

Signed-off-by: Giedrius Statkevičius <[email protected]>
  • Loading branch information
GiedriusS committed Jan 10, 2024
1 parent 7794d78 commit 69a3429
Show file tree
Hide file tree
Showing 4 changed files with 232 additions and 73 deletions.
3 changes: 3 additions & 0 deletions cmd/thanos/receive.go
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ func runReceive(
MaxBackoff: time.Duration(*conf.maxBackoff),
TSDBStats: dbs,
Limiter: limiter,
AsyncWorkerCount: conf.asyncWorkerCount,
})

grpcProbe := prober.NewGRPC()
Expand Down Expand Up @@ -837,6 +838,7 @@ type receiveConfig struct {
writeLimitsConfig *extflag.PathOrContent
storeRateLimits store.SeriesSelectLimits
limitsConfigReloadTimer time.Duration
asyncWorkerCount uint
}

func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
Expand Down Expand Up @@ -894,6 +896,7 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {

cmd.Flag("receive.replica-header", "HTTP header specifying the replica number of a write request.").Default(receive.DefaultReplicaHeader).StringVar(&rc.replicaHeader)

cmd.Flag("receive.async-workers", "Number of concurrent workers processing incoming remote-write requests.").Default("5").UintVar(&rc.asyncWorkerCount)
compressionOptions := strings.Join([]string{snappy.Name, compressionNone}, ", ")
cmd.Flag("receive.grpc-compression", "Compression algorithm to use for gRPC requests to other receivers. Must be one of: "+compressionOptions).Default(snappy.Name).EnumVar(&rc.compression, snappy.Name, compressionNone)

Expand Down
10 changes: 10 additions & 0 deletions docs/components/receive.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,14 @@ NOTE:
- Thanos Receive performs best-effort limiting. In case meta-monitoring is down/unreachable, Thanos Receive will not impose limits and only log errors for meta-monitoring being unreachable. Similarly to when one receiver cannot be scraped.
- Support for different limit configuration for different tenants is planned for the future.

## Asynchronous workers

Instead of spawning a new goroutine each time the Receiver forwards a request to another node, it spawns a fixed number of goroutines (workers) that perform the work. This allows avoiding spawning potentially tens or even hundred thousand goroutines if someone starts sending a lot of small requests.

This number of workers is controlled by `--receive.async-workers=`.

Please see the metric `thanos_receive_forward_delay_seconds` to see if you need to increase the number of workers.

## Flags

```$ mdox-exec="thanos receive --help"
Expand Down Expand Up @@ -305,6 +313,8 @@ Flags:
Path to YAML file that contains object
store configuration. See format details:
https://thanos.io/tip/thanos/storage.md/#configuration
--receive.async-workers=5 Number of concurrent workers processing
incoming remote-write requests.
--receive.default-tenant-id="default-tenant"
Default tenant ID to use when none is provided
via a header.
Expand Down
Loading

0 comments on commit 69a3429

Please sign in to comment.