Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into yuri/native-hist-quer…
Browse files Browse the repository at this point in the history
…ies-dashboard
  • Loading branch information
duricanikolic committed Jul 23, 2024
2 parents c1595cd + f15cdba commit b020cdf
Show file tree
Hide file tree
Showing 77 changed files with 574 additions and 1,076 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
* [CHANGE] Query-frontend: Remove deprecated `frontend.align_queries_with_step` YAML configuration. The configuration option has been moved to per-tenant and default `limits` since Mimir 2.12. #8733 #8735
* [CHANGE] Store-gateway: Change default of `-blocks-storage.bucket-store.max-concurrent` to 200. #8768
* [CHANGE] Added new metric `cortex_compactor_disk_out_of_space_errors_total` which counts how many times a compaction failed due to the compactor being out of disk, alert if there is a single increase. #8237 #8278
* [CHANGE] Store-gateway: Remove experimental parameter `-blocks-storage.bucket-store.series-selection-strategy`. The default strategy is now `worst-case`. #8702
* [CHANGE] Store-gateway: Rename `-blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference` to `-blocks-storage.bucket-store.series-fetch-preference` and promote to stable. #8702
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.query-engine=mimir`. #8422 #8430 #8454 #8455 #8360 #8490 #8508 #8577 #8671
* [FEATURE] Experimental Kafka-based ingest storage. #6888 #6894 #6929 #6940 #6951 #6974 #6982 #7029 #7030 #7091 #7142 #7147 #7148 #7153 #7160 #7193 #7349 #7376 #7388 #7391 #7393 #7394 #7402 #7404 #7423 #7424 #7437 #7486 #7503 #7508 #7540 #7621 #7682 #7685 #7694 #7695 #7696 #7697 #7701 #7733 #7734 #7741 #7752 #7838 #7851 #7871 #7877 #7880 #7882 #7887 #7891 #7925 #7955 #7967 #8031 #8063 #8077 #8088 #8135 #8176 #8184 #8194 #8216 #8217 #8222 #8233 #8503 #8542 #8579 #8657 #8686 #8688 #8703 #8706 #8708 #8738 #8750
* What it is:
Expand Down Expand Up @@ -68,13 +70,17 @@
* Writes dashboard: `cortex_request_duration_seconds` metric. #8757 #8791
* Reads dashboard: `cortex_request_duration_seconds` metric. #8752
* Rollout progress dashboard. #8779
* Alertmanager dashboard. #8792
* Ruler dashboard: `cortex_request_duration_seconds` metric. #8795
* Queries dashboard: `cortex_request_duration_seconds` metric. #8800
* [ENHANCEMENT] Alerts: `MimirRunningIngesterReceiveDelayTooHigh` alert has been tuned to be more reactive to high receive delay. #8538
* [ENHANCEMENT] Dashboards: improve end-to-end latency and strong read consistency panels when experimental ingest storage is enabled. #8543
* [ENHANCEMENT] Dashboards: Add panels for monitoring ingester autoscaling when not using ingest-storage. These panels are disabled by default, but can be enabled using the `autoscaling.ingester.enabled: true` config option. #8484
* [ENHANCEMENT] Dashboards: add panels to show writes to experimental ingest storage backend in the "Mimir / Ruler" dashboard, when `_config.show_ingest_storage_panels` is enabled. #8732
* [ENHANCEMENT] Dashboards: show all series in tooltips on time series dashboard panels. #8748
* [ENHANCEMENT] Dashboards: add compactor autoscaling panels to "Mimir / Compactor" dashboard. The panels are disabled by default, but can be enabled setting `_config.autoscaling.compactor.enabled` to `true`. #8777
* [ENHANCEMENT] Alerts: added `MimirKafkaClientBufferedProduceBytesTooHigh` alert. #8763
* [ENHANCEMENT] Dashboards: added "Kafka produced records / sec" panel to "Mimir / Writes" dashboard. #8763
* [BUGFIX] Dashboards: fix "current replicas" in autoscaling panels when HPA is not active. #8566
* [BUGFIX] Alerts: do not fire `MimirRingMembersMismatch` during the migration to experimental ingest storage. #8727

Expand All @@ -94,6 +100,12 @@
### Mimirtool

* [CHANGE] Analyze Rules: Count recording rules used in rules group as used. #6133
* [CHANGE] Remove deprecated `--rule-files` flag in favor of CLI arguments for the following commands: #8701
* `mimirtool rules load`
* `mimirtool rules sync`
* `mimirtool rules diff`
* `mimirtool rules check`
* `mimirtool rules prepare`

### Mimir Continuous Test

Expand Down
33 changes: 6 additions & 27 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -9492,35 +9492,14 @@
},
{
"kind": "field",
"name": "series_selection_strategy",
"name": "series_fetch_preference",
"required": false,
"desc": "This option controls the strategy to selection of series and deferring application of matchers. A more aggressive strategy will fetch less posting lists at the cost of more series. This is useful when querying large blocks in which many series share the same label name and value. Supported values (most aggressive to least aggressive): speculative, worst-case, worst-case-small-posting-lists, all.",
"desc": "This parameter controls the trade-off in fetching series versus fetching postings to fulfill a series request. Increasing the series preference results in fetching more series and reducing the volume of postings fetched. Reducing the series preference results in the opposite. Increase this parameter to reduce the rate of fetched series bytes (see \"Mimir / Queries\" dashboard) or API calls to the object store. Must be a positive floating point number.",
"fieldValue": null,
"fieldDefaultValue": "worst-case",
"fieldFlag": "blocks-storage.bucket-store.series-selection-strategy",
"fieldType": "string",
"fieldCategory": "experimental"
},
{
"kind": "block",
"name": "series_selection_strategies",
"required": false,
"desc": "",
"blockEntries": [
{
"kind": "field",
"name": "worst_case_series_preference",
"required": false,
"desc": "This option is only used when blocks-storage.bucket-store.series-selection-strategy=worst-case. Increasing the series preference results in fetching more series than postings. Must be a positive floating point number.",
"fieldValue": null,
"fieldDefaultValue": 0.75,
"fieldFlag": "blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference",
"fieldType": "float",
"fieldCategory": "experimental"
}
],
"fieldValue": null,
"fieldDefaultValue": null
"fieldDefaultValue": 0.75,
"fieldFlag": "blocks-storage.bucket-store.series-fetch-preference",
"fieldType": "float",
"fieldCategory": "advanced"
}
],
"fieldValue": null,
Expand Down
6 changes: 2 additions & 4 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -671,12 +671,10 @@ Usage of ./cmd/mimir/mimir:
Max size - in bytes - of a gap for which the partitioner aggregates together two bucket GET object requests. (default 524288)
-blocks-storage.bucket-store.posting-offsets-in-mem-sampling int
Controls what is the ratio of postings offsets that the store will hold in memory. (default 32)
-blocks-storage.bucket-store.series-fetch-preference float
This parameter controls the trade-off in fetching series versus fetching postings to fulfill a series request. Increasing the series preference results in fetching more series and reducing the volume of postings fetched. Reducing the series preference results in the opposite. Increase this parameter to reduce the rate of fetched series bytes (see "Mimir / Queries" dashboard) or API calls to the object store. Must be a positive floating point number. (default 0.75)
-blocks-storage.bucket-store.series-hash-cache-max-size-bytes uint
Max size - in bytes - of the in-memory series hash cache. The cache is shared across all tenants and it's used only when query sharding is enabled. (default 1073741824)
-blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference float
[experimental] This option is only used when blocks-storage.bucket-store.series-selection-strategy=worst-case. Increasing the series preference results in fetching more series than postings. Must be a positive floating point number. (default 0.75)
-blocks-storage.bucket-store.series-selection-strategy string
[experimental] This option controls the strategy to selection of series and deferring application of matchers. A more aggressive strategy will fetch less posting lists at the cost of more series. This is useful when querying large blocks in which many series share the same label name and value. Supported values (most aggressive to least aggressive): speculative, worst-case, worst-case-small-posting-lists, all. (default "worst-case")
-blocks-storage.bucket-store.sync-dir string
Directory to store synchronized TSDB index headers. This directory is not required to be persisted between restarts, but it's highly recommended in order to improve the store-gateway startup time. (default "./tsdb-sync/")
-blocks-storage.bucket-store.sync-interval duration
Expand Down
3 changes: 0 additions & 3 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ The following features are currently experimental:
- `-query-scheduler.querier-forget-delay`
- Store-gateway
- Use of Redis cache backend (`-blocks-storage.bucket-store.chunks-cache.backend=redis`, `-blocks-storage.bucket-store.index-cache.backend=redis`, `-blocks-storage.bucket-store.metadata-cache.backend=redis`)
- `-blocks-storage.bucket-store.series-selection-strategy`
- Eagerly loading some blocks on startup even when lazy loading is enabled `-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled`
- Read-write deployment mode
- API endpoints:
Expand Down Expand Up @@ -216,8 +215,6 @@ The following features or configuration parameters are currently deprecated and
- `-ingester.return-only-grpc-errors`
- Ingester client
- `-ingester.client.report-grpc-codes-in-instrumentation-label-enabled`
- Mimirtool
- the flag `--rule-files`
- Querier
- the flag `-querier.prefer-streaming-chunks-from-store-gateways`

Expand Down
25 changes: 9 additions & 16 deletions docs/sources/mimir/configure/configuration-parameters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4151,22 +4151,15 @@ bucket_store:
# CLI flag: -blocks-storage.bucket-store.batch-series-size
[streaming_series_batch_size: <int> | default = 5000]
# (experimental) This option controls the strategy to selection of series and
# deferring application of matchers. A more aggressive strategy will fetch
# less posting lists at the cost of more series. This is useful when querying
# large blocks in which many series share the same label name and value.
# Supported values (most aggressive to least aggressive): speculative,
# worst-case, worst-case-small-posting-lists, all.
# CLI flag: -blocks-storage.bucket-store.series-selection-strategy
[series_selection_strategy: <string> | default = "worst-case"]
series_selection_strategies:
# (experimental) This option is only used when
# blocks-storage.bucket-store.series-selection-strategy=worst-case.
# Increasing the series preference results in fetching more series than
# postings. Must be a positive floating point number.
# CLI flag: -blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference
[worst_case_series_preference: <float> | default = 0.75]
# (advanced) This parameter controls the trade-off in fetching series versus
# fetching postings to fulfill a series request. Increasing the series
# preference results in fetching more series and reducing the volume of
# postings fetched. Reducing the series preference results in the opposite.
# Increase this parameter to reduce the rate of fetched series bytes (see
# "Mimir / Queries" dashboard) or API calls to the object store. Must be a
# positive floating point number.
# CLI flag: -blocks-storage.bucket-store.series-fetch-preference
[series_fetch_preference: <float> | default = 0.75]
tsdb:
# Directory to store TSDBs (including WAL) in the ingesters. This directory is
Expand Down
18 changes: 18 additions & 0 deletions docs/sources/mimir/manage/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1485,6 +1485,24 @@ How to **investigate**:
- Check if ingesters are processing too many records, and they need to be scaled up (vertically or horizontally).
- Check actual error in logs to see whether the `-ingest-storage.kafka.wait-strong-read-consistency-timeout` or the request timeout has been hit first.
### MimirKafkaClientBufferedProduceBytesTooHigh
This alert fires when the Kafka client buffer, used to write incoming write requests to Kafka, is getting full.
How it **works**:
- Distributor and ruler encapsulate write requests into Kafka records and send them to Kafka.
- The Kafka client has a limit on the total byte size of buffered records either sent to Kafka or sent to Kafka but not acknowledged yet.
- When the limit is reached, the Kafka client stops producing more records and fast fails.
- The limit is configured via `-ingest-storage.kafka.producer-max-buffered-bytes`.
- The default limit is configured intentionally high, so that when the buffer utilization gets close to the limit, this indicates that there's probably an issue.
How to **investigate**:
- Query `cortex_ingest_storage_writer_buffered_produce_bytes{quantile="1.0"}` metrics to see the actual buffer utilization peaks.
- If the high buffer utilization is isolated to a small set of pods, then there might be an issue in the client pods.
- If the high buffer utilization is spread across all or most pods, then there might be an issue in Kafka.
### Ingester is overloaded when consuming from Kafka
This runbook covers the case an ingester is overloaded when ingesting metrics data (consuming) from Kafka.
Expand Down
7 changes: 3 additions & 4 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@ require (
github.com/golang/snappy v0.0.4
github.com/google/gopacket v1.1.19
github.com/gorilla/mux v1.8.1
github.com/grafana/dskit v0.0.0-20240718080635-f5bd38371e1c
github.com/grafana/dskit v0.0.0-20240719153732-6e8a03e781de
github.com/grafana/e2e v0.1.2-0.20240118170847-db90b84177fc
github.com/hashicorp/golang-lru v1.0.2 // indirect
github.com/json-iterator/go v1.1.12
github.com/minio/minio-go/v7 v7.0.72
github.com/minio/minio-go/v7 v7.0.74
github.com/mitchellh/go-wordwrap v1.0.1
github.com/oklog/ulid v1.3.1
github.com/opentracing-contrib/go-grpc v0.0.0-20210225150812-73cb765af46e
Expand Down Expand Up @@ -102,6 +102,7 @@ require (
github.com/Masterminds/sprig/v3 v3.2.1 // indirect
github.com/bboreham/go-loser v0.0.0-20230920113527-fcc2c21820a3 // indirect
github.com/cenkalti/backoff/v3 v3.2.2 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/go-test/deep v1.1.0 // indirect
github.com/goccy/go-json v0.10.3 // indirect
Expand Down Expand Up @@ -160,7 +161,6 @@ require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/bits-and-blooms/bitset v1.13.0 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cespare/xxhash v1.1.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0
github.com/coreos/go-semver v0.3.0 // indirect
github.com/coreos/go-systemd/v22 v22.5.0 // indirect
Expand Down Expand Up @@ -270,7 +270,6 @@ require (
google.golang.org/genproto v0.0.0-20240528184218-531527333157 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240528184218-531527333157 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240711142825-46eb208f015d
gopkg.in/ini.v1 v1.67.0 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
k8s.io/utils v0.0.0-20230726121419-3b25d923346b // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
Expand Down
Loading

0 comments on commit b020cdf

Please sign in to comment.