Skip to content
This repository has been archived by the owner on Sep 7, 2018. It is now read-only.

can't start Grafana 5.1.3 on Kubernetes 1.9.6 #167

Closed
asubmani opened this issue May 23, 2018 · 37 comments
Closed

can't start Grafana 5.1.3 on Kubernetes 1.9.6 #167

asubmani opened this issue May 23, 2018 · 37 comments

Comments

@asubmani
Copy link

I am hitting an issue similar to clossed issue #140

Using AKS. K8 version 1.9.6
using AzureDisk as pvc
Deploying using helm chart: Azure LB svc & pvc gets deployed, but pod deployment fails.

kubectl logs pod/grafanademo-5c4ff67949-pvcrs GF_PATHS_DATA='/var/lib/grafana' is not writable. You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

I can't get into the container to do a chown
I tried pulling grafana:master but same issue.

I am not a container expert so would appreciate if some can point me to a workaround to run the official image in docker, patch it and then a yaml to pull from local folder (if possible)

@ghost
Copy link

ghost commented May 24, 2018

I have similar issue on GKE (1.10.2). It seems, it should be fixed with fsGroup, or something. Hope devs would help us.

@cyrilbkr
Copy link

same here on k8s v1.10.1 on AWS and EBS disk

@ghost
Copy link

ghost commented May 24, 2018

The problem appeared from 5.1.0 version so I deployed the 5.0.0 and it worked.

@asubmani
Copy link
Author

asubmani commented May 24, 2018

It works with 5.1.3 when I use a community image monitoringartist/grafana-xxl:latest. Unfortunately I don't know enough Docker to understand what I need to change here.
However I am unable to see the Azure-monitor plugin even after adding the plugin using
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} plugins install grafana-azure-monitor-datasource
Seems I have to build my own image, push it into a pvt repo and then try OR use 5.0.0

@goshlanguage
Copy link

Experiencing the same issue on K8S 1.10.2 on Bare Metal (kubespray) with a rook-block pv.

Any maintainers have any suggested steps for further troubleshooting, the log suggests that we're migrating but I have a feeling everyone here is using a fresh install.

@goshlanguage
Copy link

This problem only seems to occur when persistence is enabled. What's the story around fixing permissions on the pvc in the case this is enabled? Seems that the image expects that its fine to not run chown if it sets the perms in its build script, but once you volume mount a fresh pvc that doesn't have such ownership, it becomes a problem.

@xlson
Copy link
Contributor

xlson commented May 24, 2018

In Grafana 5.1 we switched to a new Docker container where all files are owned by id/gid 472 (the grafana user/group). The container is also started with this id/gid. In previous versions the container started as root, changed ownership of the necessary files to the id/gid the grafana user had in previous versions. It then switched to the grafana user to run the binary.

My guess would be that the problems you are seeing are somehow related to the fact that we no longer start the container as root. If possible I would suggest trying to configure the volumes/disks to be owned by id 472. Unfortunately, I know very little of kubernetes. But I will try to dig into this on my end.

@DanCech
Copy link
Contributor

DanCech commented May 24, 2018

You might want to try the approach outlined here https://serverfault.com/questions/906083/how-to-mount-volume-with-specific-uid-in-kubernetes-pod to be able to set the filesystem permissions on the pvc before the main container starts, or if needed you can use a securityContext to specify which uid/gid grafana should run under https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod

@asubmani
Copy link
Author

Thanks @xlson and @DanCech. FYI I am not a developer and actually new to Docker and Kubernetes :). I was using helm charts to deploy Kubernetes as it is easy.
Since I don't have a .YAML to deploy my grafana container on Kubernetes, I am using the helm chart. I have to figure out a way to convert the official grafana helm chart to include the chown commands you mentioned.
@RyanHartje > Would you have a sample YAML I can reference, preferably one that uses PVC's?

@goshlanguage
Copy link

@DanCech I feel like the suggestion of manual outside steps in order to preserve data really diminishes the value of the grafana chart.

If I figure out some way to work that out with helm, I'll open a PR for the chart though. Just wanted to point out that these recent changes make a worse story for preserving Grafana's data.

Perhaps that's not an issue though for chart users, since they can define anything they'd need to persist into their chart config.

@asubmani, DanCech's suggestion above can be done while the grafana container is in its CrashBackOffLoop or whichever failure mode it was in, but I'm just going to use an older version until I or someone else addresses this issue within the chart itself.

@asubmani
Copy link
Author

@RyanHartje My container is in CrashBackOffLoop. When I try to get inside it, I get the below error.

kubectl exec grafanademo-5c4ff67949-2jwgj -c grafana sh -

error: unable to upgrade connection: container not found ("grafana")

I am trying to chown -R 472:472 in the container, but can't get in as the container doesn't start.

I also added pv.beta.kubernetes.io/gid: "472" in annotations: in the persistence in the values.yaml for the helm chart. My storage/pvc get's deployed successfully but the pod is unable to attach it due to access issues.
Will use 5.0.0 for now.

@ghost
Copy link

ghost commented May 25, 2018

As I've already mentioned above, setting securityContext seem to help.
After containers: section, try to describe next section like:

securityContext:
    fsGroup: 472

Works for me.

@smeeklai
Copy link

@unb9rn I'm using helm so I edit deployment.yaml inside templates/. I put securityContext after container: section but still get the error :(

For ex:

containers:
    ....
    securityContext:
        fsGroup: 472

@ghost
Copy link

ghost commented Jun 11, 2018

You can use the official image with 5.0.0 tag and it will work. I think that there is a bug with persistent data for the newer version

@goshlanguage
Copy link

I can no longer reproduce this from the most recent chart.

@cmorent
Copy link

cmorent commented Jun 12, 2018

@RyanHartje I've just reproduced it with the most recent chart (image version is grafana/grafana:5.1.3).

@goshlanguage
Copy link

@cmorent any chance you could try installing with my patch here:
https://github.com/ryanhartje/charts/tree/grafana-docker-167 ?

I think this should solve the issue, but I'm not able to confirm since I can't replicate.

What is your storage solution if you don't mind me asking?

@JohnnyQQQQ
Copy link
Member

@RyanHartje I have the same problem with the latest chart and 5.1.3. My storage solution is rook with ceph.

@arianitu
Copy link

I'm running into the same issue here. Default stable/grafana chart using a PVC on Azure.

@brondum
Copy link

brondum commented Jul 2, 2018

Same issue with PVC using Azure files

@aaronjpitty
Copy link

aaronjpitty commented Jul 2, 2018

@smeeklai I'm using a helm chart, if you add the following;

 securityContext:
    runAsUser: 472
    fsGroup: 472

to the line beneath the pod Spec (the first spec after metadata) in your template. It should work.

I was placing that in the containerSpec.

@goshlanguage
Copy link

I opened a PR to resolve this in the chart for helm users:
helm/charts#6428

@brondum
Copy link

brondum commented Jul 3, 2018

@ajmulhollan1 Does not seem to do any difference, at least when using "Azure Files".. do you mount your shares with specific parameters ? like gid or uid?

@goshlanguage
Copy link

@brondum the 472 above is the grafana user uid:

▶ docker run --entrypoint "id" grafana/grafana
uid=472(grafana) gid=472(grafana) groups=472(grafana)

@goshlanguage
Copy link

@brondum The defaults for Azure Files have been reported to be too restrictive in the past, maybe setting them to 755/644 for folders/files is possible?

@brondum
Copy link

brondum commented Jul 5, 2018

@RyanHartje Thanks for the tip, i have tried with the mount options, but will investigate further :)

@goshlanguage
Copy link

feel free to reach out in Kubernetes slack if I can help

@ku-s-h
Copy link

ku-s-h commented Jul 31, 2018

@RyanHartje which volume are you making persitent? My persistent volume is mounted at /var but each time grafana pod get's re-created I lose all my data.

@goshlanguage
Copy link

@mightwork I was using the grafana chart, which uses /var/lib/grafana
https://github.com/helm/charts/blob/master/stable/grafana/templates/deployment.yaml#L50

@gabrielmcf
Copy link

Having the same issue using version 5.2.2 in Azure with PVC. Rolling back to 5.0.4 until someone finds a solution.

@santiagopoli
Copy link

santiagopoli commented Aug 10, 2018

Try adding this to your deployment:

securityContext:
  runAsUser: 0

It worked for me!

@goshlanguage
Copy link

@santiagopoli that makes you run your pod as a privileged user (root). The whole reason this "issue" comes up is because grafana updated their image to follow better security practices such as running as a non priviledged user. While your suggestion functionally works, you're making your grafana instance much more vulnerable in the event of comprise by running as root, instead of the grafana user.

@spali
Copy link

spali commented Aug 11, 2018

Seems to be a general problem when mounting the volume.
mounted empty cifs share with cifs driver, results in

GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

but the plugin dir gets created anyway by grafana... but empty.
no suitable workaround found so far.
every other container I have with the same volume settings for mounting cifs shares works. But they probably haven't hardened their container yet.

reproducible with the following docker stack compose file:

services:
  grafana:
    # Full tag list: https://hub.docker.com/r/grafana/grafana/tags/
    image: grafana/grafana:5.2.2
    environment:
      #GF_INSTALL_PLUGINS: natel-influx-admin-panel,vonage-status-panel,grafana-clock-panel,grafana-simple-json-datasource
      GF_SECURITY_ADMIN_PASSWORD: mypw
      GF_USERS_ALLOW_SIGN_UP: 'false'
      GF_AUTH_DISABLE_LOGIN_FORM: 'true'
      GF_AUTH_DISABLE_SIGNOUT_MENU: 'true'
      GF_AUTH_ANONYMOUS_ENABLED: 'true'
      GF_AUTH_ANONYMOUS_ORG_NAME: 'Main Org.'
      GF_AUTH_ANONYMOUS_ORG_ROLE: 'Admin'
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure
    volumes:
      - grafana_data:/var/lib/grafana
      - grafana_conf:/etc/grafana
    ports:
      - "3000:3000"

volumes:
  grafana_data:
    driver: cifs
    driver_opts:
      share: myserver/grafana_data
      username: myuser
      password: mypw
      domain: mydomain
  grafana_conf:
    driver: cifs
    driver_opts:
      share: myserver/grafana_conf
      username: myuser
      password: mypw
      domain: mydomain

plugins are commented, but if I enable it, the logs just complains about permission denied due plugin installation.

what I don't understand, even the file system permissions are wrong initially, the plugin folder can be created by grafana but nothing else.

@santiagopoli
Copy link

santiagopoli commented Aug 11, 2018

@RyanHartje yes I know its more insecure and not a very good idea, but it seems it's the only “solution” right now when using Persistent Volumes on AWS. I put this solution here because none of the other solutions pictured in this thread worked for me and I think it could help other people.

Having said that, thanks for your comment though, as I forgot to state the security considerations of the workaround in my original comment.

@spali
Copy link

spali commented Aug 15, 2018

got grafana at least to start successfully with:

volumes:
  grafana_data:
    driver: cifs
    driver_opts:
      share: myserver/grafana_data
      username: myuser
      password: mypw
      domain: mydomain
      cifsopts: "uid=472,gid=472,nobrl"

uid,gid to make the files owner by grafana user and group in the container (id 472) resolves the general permission problems.
And second nobrl which resolves a sqlite file locking problem on cifs shares.

Probably you guys can adapt this somehow to your problems in the cloud.

@lnikell
Copy link

lnikell commented Sep 5, 2018

was able to solve the issue by getting inside the previous container and changing permissions to grafana folder
chown -R 472:472 /var/lib/grafana
after that I was able to run the new version

@xlson
Copy link
Contributor

xlson commented Sep 7, 2018

Closing this issue as the Grafana docker image has moved to the main Grafana repository. Now tracked: grafana/grafana#13187

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests