-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prow monitoring through kube-prometheus #928
base: main
Are you sure you want to change the base?
Conversation
Add initial kube-prometheus setup based on example.jsonnet. Signed-off-by: Lennart Jern <[email protected]>
We have no use for it at the moment so lets simplify and keep things minimal. Signed-off-by: Lennart Jern <[email protected]>
Make all monitoring components run on infra nodes where possible. Allow all in-cluster traffic so that prometheus can scrape node-exporter on other nodes than where prometheus itself is running. Signed-off-by: Lennart Jern <[email protected]>
Configure prometheus to retain data for 30 days and give it persistent volumes to do so. Signed-off-by: Lennart Jern <[email protected]>
This adds prometheusrules for scraping (and labeling) prowjob metrics. It also adds the necessary labels to the allow list for kube-state-metrics and adds a dashboard for the prowjob metrics. Signed-off-by: Lennart Jern <[email protected]>
Make it easier to build the manifests by running the build script in a container and by automatically installing the needed go packages. Signed-off-by: Lennart Jern <[email protected]>
Make everyone viewer, disable signup and login forms. Also disable gravatar. Signed-off-by: Lennart Jern <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add short readme for the monitoring stack and link to it from the main prow readme. Signed-off-by: Lennart Jern <[email protected]>
1685fae
to
204a910
Compare
/cc @tuminoid |
Thanks @lentzi90, I'll take a look. 99 files changed, might take a while :)
This is certainly a case where we don't want squashed commits. It is indeed very helpful to have them in the way you committed them.
I agree, we need the historic portion at least. Initially, I dislike completely detaching, but I'll comment further on this after reading the actual code. I can see the point of going with the rendered tho,
I think we can add some NetworkPolicies to enforce this as a simple hardening?
This is good approach. We're open source, our Prow is generally accessible, what would hiding it gain really? We have the ability to configure it on the backend side anyways.
Maybe we can have that in separate PR after this merges, to make it separate decision and self-documentary if it in future needs to be removed? |
Yes we can definitely do that!
Someone could DDoS it I guess, but they could already do that with Prow so 🤷 |
Indeed, our tiny Prow can be DoS'd, no need to DDoS even. Grafana and Prometheus are not abandonware, so if we keep them up to date, and do some basic hardening on the config, we should be fine. |
This adds the kube-prometheus stack to our prow cluster and configures it to monitor prowjobs on top of all that it does out of the box.
I have purposefully not squashed the commits to make it easier to follow the process but I can of course squash if that is preferred. The most interesting part is probably to compare the initial commit, that has the example.jsonnet unmodified, with the changes that comes later.
Another thing to note when reviewing: everything under
manifests
is rendered. I include it in the repository because it is what we apply in the cluster and because it is far easier to understand than the jsonnet code. It also shows clearly what the changes to the jsonnet code does when comparing commits.We should probably discuss if we want to keep the jsonnet code at all or if we just want to continue from the rendered manifests. We would then no longer be able to consume the upstream kubernetes-mixins and kube-prometheus, but we get one less tool and language to keep track of. It looks like k8s.io has chosen this route.
In any case, I think we can merge it with the jsonnet first, before we decide how to handle it. It will be a good historic record at least that shows where it came from.
A couple of "controversial" or notable points:
Fixes: #896