-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add monitoring #264
Comments
Quick answer is no, there is no intent to make the operator "monitor" anything. Ideally the operator focuses on "operation" and more specifically on the provisioning and modifying part. The "ops" part we largely leave to Patroni which is very well suited for taking care of the cluster itself. The operator however does contain a very slim API to allow monitoring it from the outside. At Zalando we use ZMON (zmon.io) for all monitoring. But there is other options here like Prometheus. We are running Postgres with the bg_mon extension exposing a lot of Postgres data via a rest API on port 8080 so this helps a lot I think. |
thanks for the quick reply! to be clear I'm not proposing monitoring the operator itself but rather the database it is operating on. if there is something in the operator that you monitor and feel others should monitor please do let me know! otherwise our system will probably just be monitoring that the pod is up and running. what I'd like to add to this operator to facilitate that is flag would would add a simple named monitoring port on the |
for that kind of web dashboard thing we've been running https://github.com/ankane/pghero which has definitely helped us a couple times but it doesn't hook into our alerting systems which is what I'm really trying to achieve here. |
Operator monitoring: We have not figured this out completely, one part here is def. user experience making sure the operator is quick to provisioning new clusters and applying changes triggered by the user but other than that we more or less monitor that the pod is running which is not that helpful and informative. Database monitoring: We don't consider this a task of the operator and our operator is not required once the database is "deployed" as Patroni does all the magic for high availability and failover, which makes the operator itself much smaller in scope and much less important. To monitor clusters as said above, both postgres and patroni have REST apis that are easy to monitor. |
I adapted the operator to deploy the postgres exporter as a sidecar container (Instead of running it inside the spilo container). With this we can get metrics to prometheus. So the operator is not monitoring anything just helps with the deployment. What you guys think? |
We had the discussion once for arbitrary side car definition support, but scratched this until the need arises. Feel free to PR this or frame it in an issue, as this could become anything from simple to very generic. Maybe we can also go for "prometheus" sidecar similar static as the Scalyr side car. Can you dump your side car definition here so we can have a look? |
I am closing this. The sidecar feature that we currently use for scalyr only in a hard coded way may see some improvements and become more generic, and then also serve the purpose of adding e.g. the postgres exporter as a sidecar via the operator. |
how about we keep this open and I send you a PR? I'll try to get you one this week which will add a monitoring side car option if you are okay with that. |
Sure, PRs or Idea Sketches are very welcome. Maybe you can outline your idea briefly, as we have some ongoing discussions internally on how sidecars should look like: from toggled hard coded examples like Scalyr now to a very generic approach. |
@Jan-M would be great to see that discussion here in the Open Source project, so others can comment/join. |
sure! so if I were to bring up the most important things for adding monitoring to this project:
I think we should start by focusing on 2 common use cases, documenting them, and changing the project's current language of
a bit more technical details of what I am proposing for monitoring side cars specifically:
going to sketch some code and share it shortly to get a bit more specific and hopefully keep the discussion going. thoughts here though? |
Just a very quick remark: Imho monitoring is still not in scope of the operator, despite that the side cars should be supported and are a good idea. For me the essence is that the operator should itself not start to "monitor" metrics or become e.g. a metric gateway/proxy. |
Hi @theRealWardo, I would some similar thoughts along the line of supporting any sidecar, not necessary monitoring (for instance, ours is doing log exporting and others may also do something like regular manual vacuuming, index rebuild or backup or backup, or even running 3rd party applications that do something (i.e. export the data somewhere else). Most of them, in general, need access to the PGDATA/logs and many also need access to the database itself. The set of parameters you came with looks good to me. We could also pass the role name that should be defined inside the infrastructure roles, and the operator would perform the job of passing the role name and the password from their to the cluster. However, in some cases it might be necessary to connect as a superuser, whose password is per-cluster. Another idea is to expose the unix socket inside the volume mount of github.com/zalando/spilo, so that other containers running in the same pod can connect with a unix socket and user postgres without a password. In order to fully support this, we would also need something along the line of I am not sure about the labels. It is not possible to apply labels to individual containers within the pod, what we could do is to apply a sidecar label with the name of the sidecar. However, it looks redundant to me, since one can always instruct monitoring to look for pods with the set of I'll look into your PR and will also do the global secrets when I have time. |
so I modified my PR to add generic sidecar support. it allows users to add as many sidecars as they like to each of the pods running their clusters. this is sufficient to meet our use cases, and could be used by your team in place of the current Scalyr specific stuff. we are going to try and run 2 sidecar containers actually. we'll be running one that does log shipping via Filebeat and another that does monitoring via Postgres Exporter. hopefully this PR will enable other interesting uses too. |
@theRealWardo how are you passing in the env vars to Postgres Exporter like DATA_SOURCE_NAME as the ones available from the postgres operator are different and i.e. POSTGRES_* or do you create another container based on the one available for postgres exporter for inclusion as a sidecar? |
right @pitabwire - we use a sidecar, 2 of them actually. one that ships logs and one that does monitoring. |
@theRealWardo you could guide on this. I tried to pass in the environment variables but for some reason they are not being picked in the container for postgres exporter, I get the error below
my docker file is shown below: ` ENV PG_EXPORTER_VERSION=v0.4.7 FROM scratch ENV PG_EXPORTER_VERSION=v0.4.7 COPY --from=builder /postgres_exporter_${PG_EXPORTER_VERSION}_linux-amd64/postgres_exporter /postgres_exporter EXPOSE 9187 ENTRYPOINT [ "/postgres_exporter" ]` |
I'm using sidecar to run postgres_exporter. The config look like this
Unfortunately, the endpoints don't expose the sidecar's port (9187 in this case) |
@tritruong the challange with doing it this way is you have to do it for every cluster definition, I would like to do it globally and in an automated way so that any new cluster definitions are automatically picked up by the prometheus monitor and alerting system. |
And dont put the password into env vars like this. I am in general in favor of having global generic sidecar def. for whatever you need. For monitoring though, or other tooling, the K8S API delivers you a nice way to discover services and clusters you want to monitor and the one exporter or tool per cluster may not be the best idea anymore. But this depends arguably. |
@Jan-M Yes, I could use mount secret file. Is there any way I could do to disable the default environment variables that always passed to sidecars (POSTGRES_USER and POSTGRES_PASSWORD)? |
@tritruong Maybe using a trust configuration with role-mapping in pg_hba.conf could grant the exporter sidecar just the required read-only access, potentially even without password-based authentication? And yes @Jan-M, I believe @tritruong does have a point. Giving every little sidecar containing just a piece of monitoring software full on admin rights to the database might not be desired :-) |
@tritruong I created a separate service for the exporter to work around that fact. |
If any1 would be interested in monitoring of Patroni itself, I've written a patroni-exporter for prometheus that scrapes the Patroni API. Someone could find it useful :) |
Here is a complete example we use internaly to enable prometheus exporter:
|
I opted into baking postgres_exporter into a custom built Spilo image and have the supervisord in the Spilo image automatically start it up. Then I tweaked the Prometheus job rules to add a custom scrape target that scrapes the postgres_exporter metrics on all |
When we upgraded our Kubernetes cluster to 1.16 the postgres-operator (1.2.0, #674) was not able to find the existing StatefulSets anymore (because of the API changes between 1.15 and 1.16).
I think it would be very helpful if the operator exposed a /metrics endpoint for Prometheus which would make it possible to alert on such things. This is not an issue if the database cluster but of the operator, so monitoring the database does not expose this kind of issue. |
@theRealWardo there are two PRs open, that combined should allow most monitoring / log-shipping use cases to be configured:
|
awesome thanks @frittentheke! |
@Yannig Hi! Can you suggest a Grafana dashboard that works with your config? Thanks! |
Does someone try that with the OperatorConfig?
Is a Service for port 9187 required? Any disadvantage using PodMonitor? Recently, I used PodMonitor for our kafka operator monitoring setup, too. |
@jkroepke that works - but not via configmaps in the later versions of the operator. But yes, you will need to add *Monitor resources to activate scraping |
For anyone looking for a Grafana Dashboard to get started with Yannig's config, try this: https://grafana.com/grafana/dashboards/9628 Simply set up a Prometheus target to scrape /metrics from the pg-exporter service, Import the Grafana dashboard and voila! |
This comment has been minimized.
This comment has been minimized.
|
I add a config like @jkroepke , but use sidecars:
- name: exporter
image: postgres_exporter:v0.10.1
ports:
- name: pg-exporter
containerPort: 9187
protocol: TCP
resources:
requests:
cpu: 50m
memory: 200M
env:
- name: CLUSTER_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['cluster-name']
- name: DATA_SOURCE_NAME
value: >-
host=/var/run/postgresql user=postgres
application_name=postgres_exporter
- name: PG_EXPORTER_CONSTANT_LABELS
value: 'release=$(CLUSTER_NAME),namespace=$(POD_NAMESPACE)' You can add a volume in CRD like this: additionalVolumes:
- name: socket-directory
mountPath: /var/run/postgresql
targetContainers:
- all
volumeSource:
emptyDir: {} |
I'm using this helm chart: Connected to pooler-replica service. Works fine |
How it works for you? if you create a new database how the new exporter will be deployed? Running helm install after apply the CR is not the idea of an operator |
@jkroepke you can configure a sidecar in the operator configuration that gets applied to all postgres pods the operator starts. |
@vitargelo to properly monitor servers you need connect directly to each postgres server, not just to random one, because you can get into issues on one of the replica, but not on another one, as results metrics MUST be taken from each server, and sidecar there fits the best. |
what does zalando do for postgres monitoring with any databases run via this operator?
I was thinking of building https://github.com/wrouesnel/postgres_exporter into the database container and having that be monitored via our prometheus operator.
is there any existing plans to add monitoring directly into this project in some way? if not, is there a need for a more detailed discussion/approach prior to contribution or shall I do as the contribution guidelines say and just hack away and send a PR?
The text was updated successfully, but these errors were encountered: