Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using gcs for storage with default helm templates results in "caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"" #932

Closed
shepely opened this issue Aug 22, 2019 · 9 comments
Assignees
Labels
type/bug Somehing is not working as expected

Comments

@shepely
Copy link

shepely commented Aug 22, 2019

Description

Hello!

I'm trying with helm to setup loki to use GCS for the object storage, while for index storage we're planning to use Cassandra eventualy. So no bigtable in the setup, which I assume is not needed for GCS usage, as there is no documenation controdicting this assumption. For a sake of simplicity, I'll keep default boltdb configuration for index storage below.

I've followed this modest intruction https://github.com/grafana/loki/blob/master/docs/operations.md#google-cloud-storage and this production setup

and got some ideas from here #256 as well.

As a result loki returns an error on a attempt to flush data to GCS:
level=error ts=2019-08-22T09:29:31.858305985Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"

To Reproduce
Steps to reproduce the behavior:

  1. Create GCS bucket.
  2. Create GCP service account and private JSON key for it.
  3. In the bucket permissions grant access for the SA by assigning a role Storage Object Admin (also tried with Storage Legacy Bucket Owner)
  4. Clone https://github.com/grafana/loki/tree/master/production/helm/loki to some local folder
  5. Add secrets.yaml file and place created JSON key in it.
loki_access_gcs: |+
    {
      "type": "service_account",
      "project_id": "my-project",
      "private_key_id": "123456789",
      "private_key": "-----BEGIN PRIVATE KEY-----\nmykey\n-----END PRIVATE KEY-----\n",
      "client_email": "[email protected]",
      "client_id": "123456789",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/loki-access-gcs%40my-project.iam.gserviceaccount.com"
    }

Using helm secrets plugin for encryption: https://github.com/futuresimple/helm-secrets

  1. In the existing templates/secret.yaml add new secret:
---
apiVersion: v1
kind: Secret
metadata:
  name: loki-access-gcs
type: Opaque
data:
  key.json: {{ .Values.loki_access_gcs | b64enc }}
  1. Slightly modify the existing templates/statefulset.yaml to include new GOOGLE_APPLICATION_CREDENTIALS env var:
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ template "loki.fullname" . }}
  namespace: {{ .Release.Namespace }}
  labels:
    app: {{ template "loki.name" . }}
    chart: {{ template "loki.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
  annotations:
    {{- toYaml .Values.annotations | nindent 4 }}
spec:
  podManagementPolicy: {{ .Values.podManagementPolicy }}
  replicas: {{ .Values.replicas }}
  selector:
    matchLabels:
      app: {{ template "loki.name" . }}
      release: {{ .Release.Name }}
  serviceName: {{ template "loki.fullname" . }}-headless
  updateStrategy:
    {{- toYaml .Values.updateStrategy | nindent 4 }}
  template:
    metadata:
      labels:
        app: {{ template "loki.name" . }}
        name: {{ template "loki.name" . }}
        release: {{ .Release.Name }}
        {{- with .Values.podLabels }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
    spec:
      serviceAccountName: {{ template "loki.serviceAccountName" . }}
    {{- if .Values.priorityClassName }}
      priorityClassName: {{ .Values.priorityClassName }}
    {{- end }}
      securityContext:
        {{- toYaml .Values.securityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          args:
            - "-config.file=/etc/loki/loki.yaml"
          {{- range $key, $value := .Values.extraArgs }}
            - "-{{ $key }}={{ $value }}"
          {{- end }}
          volumeMounts:
            - name: config
              mountPath: /etc/loki
            - name: storage
              mountPath: "/data"
              subPath: {{ .Values.persistence.subPath }}
            - name: loki-access-gcs
              mountPath: /etc/secrets
          ports:
            - name: http-metrics
              containerPort: {{ .Values.config.server.http_listen_port }}
              protocol: TCP
          livenessProbe:
            {{- toYaml .Values.livenessProbe | nindent 12 }}
          readinessProbe:
            {{- toYaml .Values.readinessProbe | nindent 12 }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          securityContext:
            readOnlyRootFilesystem: true
          env:
            {{- if .Values.tracing.jaegerAgentHost }}
            - name: JAEGER_AGENT_HOST
              value: "{{ .Values.tracing.jaegerAgentHost }}"
            {{- end }}
            - name: GOOGLE_APPLICATION_CREDENTIALS
              value: /etc/secrets/key.json
      nodeSelector:
        {{- toYaml .Values.nodeSelector | nindent 8 }}
      affinity:
        {{- toYaml .Values.affinity | nindent 8 }}
      tolerations:
        {{- toYaml .Values.tolerations | nindent 8 }}
      terminationGracePeriodSeconds: {{ .Values.terminationGracePeriodSeconds }}
      volumes:
        - name: config
          secret:
            secretName: {{ template "loki.fullname" . }}
        - name: loki-access-gcs
          secret:
            secretName: loki-access-gcs
  {{- if not .Values.persistence.enabled }}
        - name: storage
          emptyDir: {}
  {{- else if .Values.persistence.existingClaim }}
        - name: storage
          persistentVolumeClaim:
            claimName: {{ .Values.persistence.existingClaim }}
  {{- else }}
  volumeClaimTemplates:
  - metadata:
      name: storage
      annotations:
        {{- toYaml .Values.persistence.annotations | nindent 8 }}
    spec:
      accessModes:
        {{- toYaml .Values.persistence.accessModes | nindent 8 }}
      resources:
        requests:
          storage: {{ .Values.persistence.size | quote }}
      storageClassName: {{ .Values.persistence.storageClassName }}
  {{- end }}
  1. Modify loki/values.yaml to use gcs as an object_storage and add your bucket name:
## Affinity for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# podAntiAffinity:
#   requiredDuringSchedulingIgnoredDuringExecution:
#   - labelSelector:
#       matchExpressions:
#       - key: app
#         operator: In
#         values:
#         - loki
#     topologyKey: "kubernetes.io/hostname"

## StatefulSet annotations
annotations: {}

# enable tracing for debug, need install jaeger and specify right jaeger_agent_host
tracing:
  jaegerAgentHost:

config:
  auth_enabled: false
  ingester:
    chunk_idle_period: 15m
    chunk_block_size: 262144
    lifecycler:
      ring:
        kvstore:
          store: inmemory
        replication_factor: 1

      ## Different ring configs can be used. E.g. Consul
      # ring:
      #   store: consul
      #   replication_factor: 1
      #   consul:
      #     host: "consul:8500"
      #     prefix: ""
      #     httpclienttimeout: "20s"
      #     consistentreads: true
  limits_config:
    enforce_metric_name: false
    reject_old_samples: true
    reject_old_samples_max_age: 168h
  schema_config:
    configs:
    - from: 2018-04-15
      store: boltdb
      object_store: gcs
      schema: v9
      index:
        prefix: index_
        period: 168h
  server:
    http_listen_port: 3100
  storage_config:
    boltdb:
      directory: /data/loki/index
    gcs:
      bucket_name: my-bucket-name
  chunk_store_config:
    max_look_back_period: 0
  table_manager:
    retention_deletes_enabled: false
    retention_period: 0

image:
  repository: grafana/loki
  tag: v0.3.0
  pullPolicy: IfNotPresent

## Additional Loki container arguments, e.g. log level (debug, info, warn, error)
extraArgs: {}
  # log.level: debug

livenessProbe:
  httpGet:
    path: /ready
    port: http-metrics
  initialDelaySeconds: 45

## Enable persistence using Persistent Volume Claims
networkPolicy:
  enabled: false

## ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector:
  fixed: "true"

## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
## If you set enabled as "True", you need :
## - create a pv which above 10Gi and has same namespace with loki
## - keep storageClassName same with below setting
persistence:
  enabled: false
  accessModes:
  - ReadWriteOnce
  size: 10Gi
  storageClassName: default
  annotations: {}
  # subPath: ""
  # existingClaim:

## Pod Labels
podLabels: {}

## Pod Annotations
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "http-metrics"

podManagementPolicy: OrderedReady

## Assign a PriorityClassName to pods if set
# priorityClassName:

rbac:
  create: true
  pspEnabled: true

readinessProbe:
  httpGet:
    path: /ready
    port: http-metrics
  initialDelaySeconds: 45

replicas: 1

resources: {}
# limits:
#   cpu: 200m
#   memory: 256Mi
# requests:
#   cpu: 100m
#   memory: 128Mi

securityContext:
  fsGroup: 10001
  runAsGroup: 10001
  runAsNonRoot: true
  runAsUser: 10001

service:
  type: ClusterIP
  nodePort:
  port: 3100
  annotations: {}
  labels: {}

serviceAccount:
  create: true
  name:

terminationGracePeriodSeconds: 30

## Tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations:
- key: "fixed"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

# The values to set in the PodDisruptionBudget spec
# If not set then a PodDisruptionBudget will not be created
podDisruptionBudget: {}
# minAvailable: 1
# maxUnavailable: 1

updateStrategy:
  type: RollingUpdate

serviceMonitor:
  enabled: false
  interval: ""
  1. Assuming that promtail is already running on your nodes, udate loki:
 helm secrets upgrade --install loki loki/ -f loki/values.yaml -f loki/secrets.yaml
  1. Lastly check loki logs, to see whether you get errors similar to:
level=error ts=2019-08-22T11:57:30.752325389Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.75423081Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.761445231Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.765350267Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.772100702Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"
level=error ts=2019-08-22T11:57:30.772169302Z caller=flush.go:156 org_id=fake msg="failed to flush user" err="googleapi: Error 401: Invalid Credentials, authError"

Expected behavior
Data is flushed to GCS.

Environment:

  • Infrastructure: GKE 1.13
  • Deployment tool: Helm

Additional information
To validate the json key for the service account itself is valid, I've exec'ed into a devbox container within the same GKE cluster as a loki and performed following:

root@devbox-68bd5ccc68-lxbfv:/# vi key.json
root@devbox-68bd5ccc68-lxbfv:/# cat key.json
{
    "type": "service_account",
    "project_id": "my-project",
    "private_key_id": "123456789",
    "private_key": "-----BEGIN PRIVATEKEY-----\nmykey\n-----END PRIVATE KEY-----\n",
    "client_email":"[email protected]",
    "client_id": "123456789",
    "auth_uri": "https://accounts.google.com/o/oauth2auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https:/www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.comrobot/v1/metadata/x509loki-access-gcs%40my-project.iam.gserviceaccountcom"
}
root@devbox-68bd5ccc68-lxbfv:/# gcloud auth activate-service-account --key-file key.json
Activated service account credentials for: [[email protected]]

To take a quick anonymous survey, run:
  $ gcloud alpha survey

root@devbox-68bd5ccc68-lxbfv:/# touch test.txt
root@devbox-68bd5ccc68-lxbfv:/# vi test.txt
root@devbox-68bd5ccc68-lxbfv:/# gsutil cp test.txt gs://my-bucket-name/
Copying file://test.txt [Content-Type=text/plain]...
/ [1 files][   10.0 B/   10.0 B]
Operation completed over 1 objects/10.0 B.

Also I've exec'ed into the loki container, to ensure that key.json is properly mounted, see below:

k exec -it -n loki loki-0 sh
/ $ cat etc/secrets/key.json
{
  "type": "service_account",
  "project_id": "my-project",
  "private_key_id": "123456789",
  "private_key": "-----BEGIN PRIVATE KEY-----\nmykey\n-----END PRIVATE KEY-----\n",
  "client_email": "[email protected]",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/loki-access-gcs%40my-project.iam.gserviceaccount.com"
}
/ $ echo $GOOGLE_APPLICATION_CREDENTIALS
/etc/secrets/key.json

P.S.: Obviously, all sensitive data is replaced with sample one (e.g. project name, bucket name and etc)

Please, advice on how to approch the issue or confirm whether this is a Bug, as I can't be certain that the setup above is correct. Thanks!

@rfratto rfratto self-assigned this Aug 22, 2019
@rfratto
Copy link
Member

rfratto commented Aug 22, 2019

Hi @shepely, I've been able to reproduce this and verified that it is a bug. The issue is that tokens using the GOOGLE_APPLICATION_CREDENTIALS aren't being generated with a scope so they're not authenticating properly.

I'm working on a fix; updating the vendors should do it since the issue originates upstream and was fixed here: cortexproject/cortex#1511

@rfratto
Copy link
Member

rfratto commented Aug 22, 2019

Apparently updating the vendors is easier said than done 😞. @sandlis is working on a PR that'll include vendor updates and make this issue go away.

@shepely
Copy link
Author

shepely commented Aug 23, 2019

Thanks @rfratto and @sandlis 🙏

@rfratto
Copy link
Member

rfratto commented Aug 28, 2019

@shepely This should be fixed by #938, can you try this again using the Docker image grafana/loki:master-b687ec6?

@ctrox
Copy link
Contributor

ctrox commented Sep 6, 2019

I got GCS working without a hitch with grafana/loki:master-b687ec6, so IMO this can be closed.

@rfratto
Copy link
Member

rfratto commented Sep 6, 2019

Great! Closing as fixed by #938. If anyone is waiting for a new release that includes the fix, that should be coming soon.

@rfratto rfratto closed this as completed Sep 6, 2019
@tahsinrahman
Copy link

how to add the gcp service account keys from helm charts?

@jacobfederer
Copy link

I was able to grant Loki access to the GCP bucket with the following configuration (the current version of the chart doesn't make it too hard):

  1. Create GCS bucket
  2. Create GCP service account with the role 'Storage Object Admin'.
  3. Use the json secret file to create a k8s secret: kubectl create secret generic -n monitoring loki-access-gcs --from-file=key.json=key.json
  4. Create a file with the following Loki config and deploy it with helm (the env, extraVolumes and extraVolumeMounts make the difference):
loki:
  env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /etc/secrets/key.json

  extraVolumes:
    - name: loki-access-gcs
      secret:
        secretName: loki-access-gcs

  extraVolumeMounts:
    - name: loki-access-gcs
      mountPath: /etc/secrets

  config:
    auth_enabled: false

    server:
      http_listen_port: 3100

    schema_config:
      configs:
        - object_store: gcs
          store: boltdb-shipper
          schema: v11
          index:
            prefix: index_loki_
            period: 24h
          chunks:
            prefix: chunk_loki_
            period: 24h

    storage_config:
      boltdb_shipper:
        active_index_directory: /data/loki/index
        shared_store: gcs
        cache_location: /data/loki/index_cache
        resync_interval: 5s
      gcs:
        bucket_name: grafana_loki_data

    table_manager:
      retention_deletes_enabled: true
      retention_period: 720h

    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 24h
  1. Enjoy.

@titarch
Copy link

titarch commented Sep 28, 2022

I was able to grant Loki access to the GCP bucket with the following configuration (the current version of the chart doesn't make it too hard):

  1. Create GCS bucket
  2. Create GCP service account with the role 'Storage Object Admin'.
  3. Use the json secret file to create a k8s secret: kubectl create secret generic -n monitoring loki-access-gcs --from-file=key.json=key.json
  4. Create a file with the following Loki config and deploy it with helm (the env, extraVolumes and extraVolumeMounts make the difference):
loki:
  env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /etc/secrets/key.json

  extraVolumes:
    - name: loki-access-gcs
      secret:
        secretName: loki-access-gcs

  extraVolumeMounts:
    - name: loki-access-gcs
      mountPath: /etc/secrets

  config:
    auth_enabled: false

    server:
      http_listen_port: 3100

    schema_config:
      configs:
        - object_store: gcs
          store: boltdb-shipper
          schema: v11
          index:
            prefix: index_loki_
            period: 24h
          chunks:
            prefix: chunk_loki_
            period: 24h

    storage_config:
      boltdb_shipper:
        active_index_directory: /data/loki/index
        shared_store: gcs
        cache_location: /data/loki/index_cache
        resync_interval: 5s
      gcs:
        bucket_name: grafana_loki_data

    table_manager:
      retention_deletes_enabled: true
      retention_period: 720h

    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 24h
  1. Enjoy.

Thanks a lot, I had been stuck at this for too long.
For those using loki-simple-scalable, I had to put the env and volumes under both read and write configs, the rest is similar.
I also had to remove loki.storage_config.boltdb_shipper config as it was getting me write permission errors, seems like the default works so I am fine with it.

read:
  extraEnv:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /etc/secrets/key.json
  extraVolumes:
    - name: loki-access-gcs
      secret:
        secretName: loki-access-gcs
  extraVolumeMounts:
    - name: loki-access-gcs
      mountPath: /etc/secrets

write:
  extraEnv:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /etc/secrets/key.json
  extraVolumes:
    - name: loki-access-gcs
      secret:
        secretName: loki-access-gcs
  extraVolumeMounts:
    - name: loki-access-gcs
      mountPath: /etc/secrets

@chaudum chaudum added the type/bug Somehing is not working as expected label Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

7 participants