This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

13 Branches 262 Tags

Name	Name	Last commit message	Last commit date
Latest commit QuentinBisson Document sharding overrides/ May 7, 2024 9e3caa7 · May 7, 2024 History 1,425 Commits
.circleci	.circleci	Remove azure provider (#1572 )	Mar 18, 2024
.github	.github	Align files (#1598 )	Apr 26, 2024
Documentation	Documentation	Remove azure provider (#1572 )	Mar 18, 2024
api/v1alpha1	api/v1alpha1	add remoteWrite.status field (#935 )	Jul 13, 2022
config/crd	config/crd	add remoteWrite.status field (#935 )	Jul 13, 2022
examples	examples	Handle remoteTimeout in RemoteWrite secret and set it to 60s (#1418 )	Oct 4, 2023
files/templates	files/templates	Add `cluster_control_plane_unhealthy` inhibition. (#1607 )	May 6, 2024
flag	flag	Expose prometheus agent sharding strategy (#1605 )	May 2, 2024
hack	hack	remoteWrite CRD - built from Makefile but manually triggered (#901 )	May 26, 2022
helm/prometheus-meta-operator	helm/prometheus-meta-operator	Expose prometheus agent sharding strategy (#1605 )	May 2, 2024
pkg	pkg	feat: configure prometheus agent sharding per cluster	May 6, 2024
server	server	Release v4.0.0 (#941 )	Jul 13, 2022
service	service	feat: configure prometheus agent sharding per cluster	May 6, 2024
.gitignore	.gitignore	Fixes gitignore (#3 )	Dec 16, 2019
.golangci.yaml	.golangci.yaml	Copy alertmanager code from 72deec44a8692722a4ee03115d70376309184a08 (#…	Mar 23, 2021
.nancy-ignore	.nancy-ignore	Upgrade dependencies (#1445 )	Nov 16, 2023
.nancy-ignore.generated	.nancy-ignore.generated	Align files (#937 )	Jul 13, 2022
.pre-commit-config.yaml	.pre-commit-config.yaml	Align files (#1430 )	Oct 19, 2023
CHANGELOG.md	CHANGELOG.md	Update CHANGELOG.md	May 7, 2024
CODEOWNERS	CODEOWNERS	Manage codeowners with our automation (#1359 )	Aug 8, 2023
DCO	DCO	Initial commit	Dec 16, 2019
Dockerfile	Dockerfile	Update alpine Docker tag to v3.19.1 (#1492 )	Jan 30, 2024
LICENSE	LICENSE	Align files (#1566 )	Mar 14, 2024
Makefile	Makefile	Align files (#1598 )	Apr 26, 2024
Makefile.gen.app.mk	Makefile.gen.app.mk	Align files (#1598 )	Apr 26, 2024
Makefile.gen.go.mk	Makefile.gen.go.mk	Align files (#1598 )	Apr 26, 2024
Makefile.gen.k8sapi.mk	Makefile.gen.k8sapi.mk	Align files (#1598 )	Apr 26, 2024
README.md	README.md	Document sharding overrides/	May 7, 2024
SECURITY.md	SECURITY.md	Align files (#1581 )	Apr 2, 2024
go.mod	go.mod	Update module google.golang.org/protobuf to v1.34.0 (#1600 )	Apr 30, 2024
go.sum	go.sum	Update module google.golang.org/protobuf to v1.34.0 (#1600 )	Apr 30, 2024
main.go	main.go	Expose prometheus agent sharding strategy (#1605 )	May 2, 2024
renovate.json5	renovate.json5	Align files (#1431 )	Oct 26, 2023

Repository files navigation

prometheus-meta-operator

The prometheus-meta-operator watches Cluster CRs and creates prometheus-operator CRs. It is implemented using operatorkit.

Getting Project

Clone the git repository: https://github.com/giantswarm/prometheus-meta-operator.git

How to build

Build it using the standard go build command.

go build github.com/giantswarm/prometheus-meta-operator

You may want to regenerate the unit test files with:

go test -v ./... -update

How to update upstream code

We store modified upstream code for our own usage.

pkg/alertmanager/config
pkg/prometheus/common/config

Initial upstream setup

Add the upstream git repository:

$ git remote add alertmanager https://github.com/prometheus/alertmanager.git

On first run commands are the same as for Upgrade except for git subtree merge which has to be replaced with:

$ git subtree add --squash -P pkg/alertmanager/config alertmanager-config

Upgrade

# add upstream tags
$ git tag -d $(git tag -l)
$ git fetch alertmanager

$ git checkout v0.22.2
$ git subtree split -P config/ -b alertmanager-config
$ git checkout -b alertmanager-0.22.2 origin/master
$ git subtree merge --message "Upgrade alertmanager/config to v0.22.2" --squash -P pkg/alertmanager/config alertmanager-config
# fix conflicts (the usual way) if any

# restore local tags
$ git tag -d $(git tag -l)
$ git fetch

# push for review
$ git push -u origin HEAD

/!\ Do not merge with squash, once approved merge to master manually.
/!\ We need to preserve commit history otherwise following git subtree commands won't work.
$ git checkout master
$ git merge --ff-only alertmanager-0.22.2
$ git push

remoteWrite CRs

Prometheus-meta-operator also manages remoteWrite custom resources.

remoteWrite CRDs

Code for remoteWrite CRDs is in the api/v1alpha1/ directory.

The actual CRDs are in config/crd/monitoring.giantswarm.io_remotewrites.yaml

To generate the CRDs from code, just use make generate.

Deployment

CRDs deployment is managed within the helm chart. The remoteWrite CRD is located under the chart's templates directory as a symbolic link to the generated yaml file.

Custom Prometheus volume size

Prometheus-meta-operator provides a way of setting custom Prometheus volume size.

The Prometheus volume size can be set on the cluster CR using the dedicated annotation monitoring.giantswarm.io/prometheus-volume-size

Three values are possible:

small = 30 Gi
medium = 100 Gi
large = 200 Gi

while medium is the default value.

The retention size of prometheis will be set according to the volume size: we apply a ratio of 90%:

small (30 Gi) => retentionSize = 27Gi
medium (100 Gi) => retentionSize = 90Gi
large (200 Gi) => retentionSize = 180Gi

Check Prometheus Volume Sizing for more details.

Prometheus Agent Sharding

Prometheus Meta Operator configures the Prometheus Agent instances running in workload clusters (pre-mimir setup cf. observability-operator).

To be able to ingest metrics without disrupting the workload running in the clusters, Prometheus Meta Operator can shard the number of running Prometheus Agents.

The default configuration is defined in PMO itself PMO add a new shard every 1M time series present in the WC prometheus running on the management cluster. To avoid scaling down too abruptly, we defined a scale down threshold of 20%.

As this default value was not enough to avoid workload disruption, we added 2 ways to be able to override the scale up series count target and the scale down percentage.

Those values can be configured at the installation level by overriding the following values:

prometheusAgent:
  shardScaleUpSeriesCount: 1000000
  shardScaleDownPercentage: 0.20

Those values can also be set per cluster using the following cluster annotations:

monitoring.giantswarm.io/prometheus-agent-scale-up-series-count: 1000000
monitoring.giantswarm.io/prometheus-agent-scale-down-percentage: 0.20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prometheus-meta-operator

Getting Project

How to build

How to update upstream code

Initial upstream setup

Upgrade

remoteWrite CRs

remoteWrite CRDs

Deployment

Custom Prometheus volume size

Prometheus Agent Sharding

About

Releases 262

Contributors 51

Languages

License

giantswarm/prometheus-meta-operator

Folders and files

Latest commit

History

Repository files navigation

prometheus-meta-operator

Getting Project

How to build

How to update upstream code

Initial upstream setup

Upgrade

remoteWrite CRs

remoteWrite CRDs

Deployment

Custom Prometheus volume size

Prometheus Agent Sharding

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases 262

Contributors 51

Languages