can't start Grafana 5.1.3 on Kubernetes 1.9.6 #167

asubmani · 2018-05-23T21:24:35Z

I am hitting an issue similar to clossed issue #140

Using AKS. K8 version 1.9.6
using AzureDisk as pvc
Deploying using helm chart: Azure LB svc & pvc gets deployed, but pod deployment fails.

kubectl logs pod/grafanademo-5c4ff67949-pvcrs GF_PATHS_DATA='/var/lib/grafana' is not writable. You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

I can't get into the container to do a chown
I tried pulling grafana:master but same issue.

I am not a container expert so would appreciate if some can point me to a workaround to run the official image in docker, patch it and then a yaml to pull from local folder (if possible)

The text was updated successfully, but these errors were encountered:

ghost · 2018-05-24T10:40:32Z

I have similar issue on GKE (1.10.2). It seems, it should be fixed with fsGroup, or something. Hope devs would help us.

cyrilbkr · 2018-05-24T12:47:36Z

same here on k8s v1.10.1 on AWS and EBS disk

ghost · 2018-05-24T13:22:26Z

The problem appeared from 5.1.0 version so I deployed the 5.0.0 and it worked.

asubmani · 2018-05-24T13:37:20Z

It works with 5.1.3 when I use a community image monitoringartist/grafana-xxl:latest. Unfortunately I don't know enough Docker to understand what I need to change here.
However I am unable to see the Azure-monitor plugin even after adding the plugin using
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} plugins install grafana-azure-monitor-datasource
Seems I have to build my own image, push it into a pvt repo and then try OR use 5.0.0

goshlanguage · 2018-05-24T14:34:40Z

Experiencing the same issue on K8S 1.10.2 on Bare Metal (kubespray) with a rook-block pv.

Any maintainers have any suggested steps for further troubleshooting, the log suggests that we're migrating but I have a feeling everyone here is using a fresh install.

goshlanguage · 2018-05-24T15:06:51Z

This problem only seems to occur when persistence is enabled. What's the story around fixing permissions on the pvc in the case this is enabled? Seems that the image expects that its fine to not run chown if it sets the perms in its build script, but once you volume mount a fresh pvc that doesn't have such ownership, it becomes a problem.

xlson · 2018-05-24T16:10:15Z

In Grafana 5.1 we switched to a new Docker container where all files are owned by id/gid 472 (the grafana user/group). The container is also started with this id/gid. In previous versions the container started as root, changed ownership of the necessary files to the id/gid the grafana user had in previous versions. It then switched to the grafana user to run the binary.

My guess would be that the problems you are seeing are somehow related to the fact that we no longer start the container as root. If possible I would suggest trying to configure the volumes/disks to be owned by id 472. Unfortunately, I know very little of kubernetes. But I will try to dig into this on my end.

DanCech · 2018-05-24T17:03:37Z

You might want to try the approach outlined here https://serverfault.com/questions/906083/how-to-mount-volume-with-specific-uid-in-kubernetes-pod to be able to set the filesystem permissions on the pvc before the main container starts, or if needed you can use a securityContext to specify which uid/gid grafana should run under https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod

asubmani · 2018-05-24T17:10:45Z

Thanks @xlson and @DanCech. FYI I am not a developer and actually new to Docker and Kubernetes :). I was using helm charts to deploy Kubernetes as it is easy.
Since I don't have a .YAML to deploy my grafana container on Kubernetes, I am using the helm chart. I have to figure out a way to convert the official grafana helm chart to include the chown commands you mentioned.
@RyanHartje > Would you have a sample YAML I can reference, preferably one that uses PVC's?

goshlanguage · 2018-05-24T20:44:00Z

@DanCech I feel like the suggestion of manual outside steps in order to preserve data really diminishes the value of the grafana chart.

If I figure out some way to work that out with helm, I'll open a PR for the chart though. Just wanted to point out that these recent changes make a worse story for preserving Grafana's data.

Perhaps that's not an issue though for chart users, since they can define anything they'd need to persist into their chart config.

@asubmani, DanCech's suggestion above can be done while the grafana container is in its CrashBackOffLoop or whichever failure mode it was in, but I'm just going to use an older version until I or someone else addresses this issue within the chart itself.

asubmani · 2018-05-24T22:16:06Z

@RyanHartje My container is in CrashBackOffLoop. When I try to get inside it, I get the below error.

kubectl exec grafanademo-5c4ff67949-2jwgj -c grafana sh -

error: unable to upgrade connection: container not found ("grafana")

I am trying to chown -R 472:472 in the container, but can't get in as the container doesn't start.

I also added pv.beta.kubernetes.io/gid: "472" in annotations: in the persistence in the values.yaml for the helm chart. My storage/pvc get's deployed successfully but the pod is unable to attach it due to access issues.
Will use 5.0.0 for now.

ghost · 2018-05-25T08:07:45Z

As I've already mentioned above, setting securityContext seem to help.
After containers: section, try to describe next section like:

securityContext:
    fsGroup: 472

Works for me.

smeeklai · 2018-06-11T08:20:20Z

@unb9rn I'm using helm so I edit deployment.yaml inside templates/. I put securityContext after container: section but still get the error :(

For ex:

containers:
    ....
    securityContext:
        fsGroup: 472

ghost · 2018-06-11T08:54:42Z

You can use the official image with 5.0.0 tag and it will work. I think that there is a bug with persistent data for the newer version

goshlanguage · 2018-06-11T22:22:50Z

I can no longer reproduce this from the most recent chart.

cmorent · 2018-06-12T09:42:00Z

@RyanHartje I've just reproduced it with the most recent chart (image version is grafana/grafana:5.1.3).

goshlanguage · 2018-06-12T18:23:52Z

@cmorent any chance you could try installing with my patch here:
https://github.com/ryanhartje/charts/tree/grafana-docker-167 ?

I think this should solve the issue, but I'm not able to confirm since I can't replicate.

What is your storage solution if you don't mind me asking?

JohnnyQQQQ · 2018-06-15T11:54:28Z

@RyanHartje I have the same problem with the latest chart and 5.1.3. My storage solution is rook with ceph.

arianitu · 2018-06-26T04:06:33Z

I'm running into the same issue here. Default stable/grafana chart using a PVC on Azure.

brondum · 2018-07-02T11:31:14Z

Same issue with PVC using Azure files

aaronjpitty · 2018-07-02T13:41:29Z

@smeeklai I'm using a helm chart, if you add the following;

 securityContext:
    runAsUser: 472
    fsGroup: 472

to the line beneath the pod Spec (the first spec after metadata) in your template. It should work.

I was placing that in the containerSpec.

goshlanguage · 2018-07-02T17:08:42Z

I opened a PR to resolve this in the chart for helm users:
helm/charts#6428

brondum · 2018-07-03T08:51:53Z

@ajmulhollan1 Does not seem to do any difference, at least when using "Azure Files".. do you mount your shares with specific parameters ? like gid or uid?

goshlanguage · 2018-07-03T11:58:05Z

@brondum the 472 above is the grafana user uid:

▶ docker run --entrypoint "id" grafana/grafana
uid=472(grafana) gid=472(grafana) groups=472(grafana)

goshlanguage · 2018-07-03T12:01:40Z

@brondum The defaults for Azure Files have been reported to be too restrictive in the past, maybe setting them to 755/644 for folders/files is possible?

brondum · 2018-07-05T06:46:26Z

@RyanHartje Thanks for the tip, i have tried with the mount options, but will investigate further :)

goshlanguage · 2018-07-05T15:14:17Z

feel free to reach out in Kubernetes slack if I can help

ku-s-h · 2018-07-31T16:41:55Z

@RyanHartje which volume are you making persitent? My persistent volume is mounted at /var but each time grafana pod get's re-created I lose all my data.

goshlanguage · 2018-08-01T11:13:15Z

@mightwork I was using the grafana chart, which uses /var/lib/grafana
https://github.com/helm/charts/blob/master/stable/grafana/templates/deployment.yaml#L50

gabrielmcf · 2018-08-02T14:43:32Z

Having the same issue using version 5.2.2 in Azure with PVC. Rolling back to 5.0.4 until someone finds a solution.

santiagopoli · 2018-08-10T18:17:10Z

Try adding this to your deployment:

securityContext:
  runAsUser: 0

It worked for me!

goshlanguage · 2018-08-10T19:01:05Z

@santiagopoli that makes you run your pod as a privileged user (root). The whole reason this "issue" comes up is because grafana updated their image to follow better security practices such as running as a non priviledged user. While your suggestion functionally works, you're making your grafana instance much more vulnerable in the event of comprise by running as root, instead of the grafana user.

spali · 2018-08-11T13:14:38Z

Seems to be a general problem when mounting the volume.
mounted empty cifs share with cifs driver, results in

GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

but the plugin dir gets created anyway by grafana... but empty.
no suitable workaround found so far.
every other container I have with the same volume settings for mounting cifs shares works. But they probably haven't hardened their container yet.

reproducible with the following docker stack compose file:

services:
  grafana:
    # Full tag list: https://hub.docker.com/r/grafana/grafana/tags/
    image: grafana/grafana:5.2.2
    environment:
      #GF_INSTALL_PLUGINS: natel-influx-admin-panel,vonage-status-panel,grafana-clock-panel,grafana-simple-json-datasource
      GF_SECURITY_ADMIN_PASSWORD: mypw
      GF_USERS_ALLOW_SIGN_UP: 'false'
      GF_AUTH_DISABLE_LOGIN_FORM: 'true'
      GF_AUTH_DISABLE_SIGNOUT_MENU: 'true'
      GF_AUTH_ANONYMOUS_ENABLED: 'true'
      GF_AUTH_ANONYMOUS_ORG_NAME: 'Main Org.'
      GF_AUTH_ANONYMOUS_ORG_ROLE: 'Admin'
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure
    volumes:
      - grafana_data:/var/lib/grafana
      - grafana_conf:/etc/grafana
    ports:
      - "3000:3000"

volumes:
  grafana_data:
    driver: cifs
    driver_opts:
      share: myserver/grafana_data
      username: myuser
      password: mypw
      domain: mydomain
  grafana_conf:
    driver: cifs
    driver_opts:
      share: myserver/grafana_conf
      username: myuser
      password: mypw
      domain: mydomain

plugins are commented, but if I enable it, the logs just complains about permission denied due plugin installation.

what I don't understand, even the file system permissions are wrong initially, the plugin folder can be created by grafana but nothing else.

santiagopoli · 2018-08-11T15:42:27Z

@RyanHartje yes I know its more insecure and not a very good idea, but it seems it's the only “solution” right now when using Persistent Volumes on AWS. I put this solution here because none of the other solutions pictured in this thread worked for me and I think it could help other people.

Having said that, thanks for your comment though, as I forgot to state the security considerations of the workaround in my original comment.

spali · 2018-08-15T14:39:01Z

got grafana at least to start successfully with:

volumes:
  grafana_data:
    driver: cifs
    driver_opts:
      share: myserver/grafana_data
      username: myuser
      password: mypw
      domain: mydomain
      cifsopts: "uid=472,gid=472,nobrl"

uid,gid to make the files owner by grafana user and group in the container (id 472) resolves the general permission problems.
And second nobrl which resolves a sqlite file locking problem on cifs shares.

Probably you guys can adapt this somehow to your problems in the cloud.

lnikell · 2018-09-05T17:50:05Z

was able to solve the issue by getting inside the previous container and changing permissions to grafana folder
chown -R 472:472 /var/lib/grafana
after that I was able to run the new version

xlson · 2018-09-07T09:03:11Z

Closing this issue as the Grafana docker image has moved to the main Grafana repository. Now tracked: grafana/grafana#13187

goshlanguage mentioned this issue Jul 2, 2018

added security context to template if using persisted volumes helm/charts#6428

Closed

himanshurajput32 mentioned this issue Jul 19, 2018

Can't start Grafana 5.2.1 on Kubernetes 1.10 #178

Closed

xlson mentioned this issue Sep 7, 2018

Trouble with Kubernetes volumes and the 5.1.3+ Docker image grafana/grafana#13187

Closed

xlson closed this as completed Sep 7, 2018

awdrius mentioned this issue Sep 10, 2018

Grafana fails with persistence enabled istio/istio#8594

Closed

can't start Grafana 5.1.3 on Kubernetes 1.9.6 #167

can't start Grafana 5.1.3 on Kubernetes 1.9.6 #167

Comments

asubmani commented May 23, 2018

ghost commented May 24, 2018

cyrilbkr commented May 24, 2018

ghost commented May 24, 2018

asubmani commented May 24, 2018 • edited Loading

goshlanguage commented May 24, 2018

goshlanguage commented May 24, 2018

xlson commented May 24, 2018

DanCech commented May 24, 2018

asubmani commented May 24, 2018

goshlanguage commented May 24, 2018

asubmani commented May 24, 2018

ghost commented May 25, 2018

smeeklai commented Jun 11, 2018

ghost commented Jun 11, 2018

goshlanguage commented Jun 11, 2018

cmorent commented Jun 12, 2018

goshlanguage commented Jun 12, 2018

JohnnyQQQQ commented Jun 15, 2018

arianitu commented Jun 26, 2018

brondum commented Jul 2, 2018

aaronjpitty commented Jul 2, 2018 • edited Loading

goshlanguage commented Jul 2, 2018

brondum commented Jul 3, 2018

goshlanguage commented Jul 3, 2018

goshlanguage commented Jul 3, 2018

brondum commented Jul 5, 2018

goshlanguage commented Jul 5, 2018

ku-s-h commented Jul 31, 2018

goshlanguage commented Aug 1, 2018

gabrielmcf commented Aug 2, 2018

santiagopoli commented Aug 10, 2018 • edited Loading

goshlanguage commented Aug 10, 2018

spali commented Aug 11, 2018

santiagopoli commented Aug 11, 2018 • edited Loading

spali commented Aug 15, 2018

lnikell commented Sep 5, 2018

xlson commented Sep 7, 2018

asubmani commented May 24, 2018 •

edited

Loading

aaronjpitty commented Jul 2, 2018 •

edited

Loading

santiagopoli commented Aug 10, 2018 •

edited

Loading

santiagopoli commented Aug 11, 2018 •

edited

Loading