Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[documentation] Add documentation for disk read bytes per second limit #2677

Merged
merged 7 commits into from
Sep 29, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 33 additions & 10 deletions docs/operational_guide/resource_limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,27 +11,50 @@ performance of M3 in a production environment.
The best way to get started protecting M3DB nodes is to set a few limits on the
top level `limits` config stanza for M3DB.

When using M3DB for metrics workloads, queries arrive as a set of matchers
that select time series based on certain dimensions. The primary mechanism to
protect against these matchers matching huge amounts of data in an unbounded
way is to set a maximum limit for the amount of time series blocks allowed to
be matched and consequently read in a given time window. This can be done using
`maxRecentlyQueriedSeriesBlocks` to set a maximum value and lookback time window
to determine the duration over which the max limit is enforced.
The primary concern is to set a limit on the historical bytes read from disk
rallen090 marked this conversation as resolved.
Show resolved Hide resolved
since this directly causes memory pressure when large historical queries are
issued, one of the most common abusive patterns. Reading time series data that
is already in-memory (either due to it being cached already or it's actively
being written) costs much less than reading historical time series data. To set
a limit use the `maxRecentlyQueriedSeriesDiskBytesRead` stanza to define a
policy for how much historical time series data can be read over a given
lookback time window.

The secondary concern here is just the total number of time series data read
in total, since even querying time series data already in memory in an unbound
manner can overwhelm a database node. When using M3DB for metrics workloads,
queries arrive as a set of matchers that select time series based on certain
dimensions. The primary mechanism to protect against these matchers matching
huge amounts of data in an unbounded way is to set a maximum limit for the
amount of time series blocks allowed to be matched and consequently read in a
given time window. This can be done using `maxRecentlyQueriedSeriesBlocks` to
set a maximum value and lookback time window to determine the duration over
which the max limit is enforced.

You can use the Prometheus query `rate(query_stats_total_docs_per_block[1m])` to
rallen090 marked this conversation as resolved.
Show resolved Hide resolved
determine how many time series blocks are queried per second by your cluster
today to determine what is a sane value to set this to. Make sure to multiply
that number by the `lookback` period to get your desired max value. For
instance, if the query shows that you frequently query 10,000 time series blocks
per second safely with your deployment and you want to use the default lookback
of `5s` then you would multiply 10,000 by 5 to get 50,000 as a max value with
a 5s lookback.
of `15s` then you would multiply 10,000 by 15 to get 150,000 as a max value with
a 15s lookback.

### Annotated configuration

```
limits:
# If set, will enforce a maximum cap on disk read bytes for time series that
# resides historically on disk (and is not already in memory).
maxRecentlyQueriedSeriesDiskBytesRead:
# Value sets the maximum disk read bytes for historical data.
value: 0
# Lookback sets the time window that this limit is enforced over, every
# lookback period the global count is reset to zero and when the limit
# is reached it will reject any further time series blocks being matched
# and read until the lookback period resets.
lookback: 15s

# If set, will enforce a maximum cap on time series blocks matched for
# queries searching time series by dimensions.
maxRecentlyQueriedSeriesBlocks:
Expand All @@ -44,7 +67,7 @@ limits:
# lookback period the global count is reset to zero and when the limit
# is reached it will reject any further time series blocks being matched
# and read until the lookback period resets.
lookback: 5s
lookback: 15s

# If set then will limit the number of parallel write batch requests to the
# database and return errors if hit.
Expand Down
10 changes: 5 additions & 5 deletions src/cmd/services/m3dbnode/config/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,16 @@ import "time"

// LimitsConfiguration contains configuration for configurable limits that can be applied to M3DB.
type LimitsConfiguration struct {
// MaxRecentlyQueriedSeriesBlocks sets the upper limit on time series blocks
// count within a given lookback period. Queries which are issued while this
// max is surpassed encounter an error.
MaxRecentlyQueriedSeriesBlocks *MaxRecentQueryResourceLimitConfiguration `yaml:"maxRecentlyQueriedSeriesBlocks"`

// MaxRecentlyQueriedSeriesDiskBytesRead sets the upper limit on time series bytes
// read from disk within a given lookback period. Queries which are issued while this
// max is surpassed encounter an error.
MaxRecentlyQueriedSeriesDiskBytesRead *MaxRecentQueryResourceLimitConfiguration `yaml:"maxRecentlyQueriedSeriesDiskBytesRead"`

// MaxRecentlyQueriedSeriesBlocks sets the upper limit on time series blocks
// count within a given lookback period. Queries which are issued while this
// max is surpassed encounter an error.
MaxRecentlyQueriedSeriesBlocks *MaxRecentQueryResourceLimitConfiguration `yaml:"maxRecentlyQueriedSeriesBlocks"`

// MaxOutstandingWriteRequests controls the maximum number of outstanding write requests
// that the server will allow before it begins rejecting requests. Note that this value
// is independent of the number of values that are being written (due to variable batch
Expand Down