Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent Fails to Start on GKE Autopilot #4699

Closed
ruler501 opened this issue Jul 28, 2021 · 2 comments
Closed

Elastic Agent Fails to Start on GKE Autopilot #4699

ruler501 opened this issue Jul 28, 2021 · 2 comments

Comments

@ruler501
Copy link

Bug Report

What did you do?
Tried to deploy an Elastic Agent to a GKE cluster using Autopilot

What did you expect to see?
An elastic agent pod to start.

What did you see instead? Under which circumstances?
The agent failed to start because it is trying to mount a hostPath directory in write mode. It looks like the Fleet mode that was recently introduced on master doesn't use the same hostPath mount so it is possible that will alleviate the problem, but I cannot verify that currently.

Environment

  • ECK version:

1.6.0

  • Kubernetes information:

    • Cloud: GKE Autopilot
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.18-dispatcher", GitCommit:"de944d802c49735ad0f4bfe82ddfa19737ebe962", GitTreeState:"clean", BuildDate:"2021-05-13T17:56:14Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.9-gke.1900", GitCommit:"008fd38bf3dc201bebdd4fe26edf9bf87478309a", GitTreeState:"clean", BuildDate:"2021-04-14T09:22:08Z", GoVersion:"go1.15.8b5", Compiler:"gc", Platform:"linux/amd64"}
  • Resource definition:
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: quickstart
spec:
  version: 7.13.4
  elasticsearchRefs:
  - name: quickstart
  daemonSet: {}
  config:
    inputs:
      - name: system-1
        revision: 1
        type: system/metrics
        use_output: default
        meta:
          package:
            name: system
            version: 0.9.1
        data_stream:
          namespace: default
        streams:
          - id: system/metrics-system.cpu
            data_stream:
              dataset: system.cpu
              type: metrics
            metricsets:
              - cpu
            cpu.metrics:
              - percentages
              - normalized_percentages
            period: 10s
  • Logs:
admission webhook "validation.gatekeeper.sh" denied the request: [denied by autogke-no-write-mode-hostpath] hostPath volume agent-data in container agent is accessed in write mode; disallowed in Autopilot. Requesting user: <system:serviceaccount:elastic-system:elastic-operator> and groups: <["system:serviceaccounts", "system:serviceaccounts:elastic-system", "system:authenticated"]>
@botelastic botelastic bot added the triage label Jul 28, 2021
ruler501 added a commit to CubeArtisan/cubeartisan that referenced this issue Jul 28, 2021
The non-fleet elastic agent doesn't work on autopilot since it tries to mount a hostPath volume for writing.
See #elastic/cloud-on-k8s#4699 for details.
Unfortunately fleet support is only available on master of cloud-on-k8s and not really documented.
Will wait to deploy this until we come up with a decision on if we want to use an experimental build.
@david-kow
Copy link
Contributor

david-kow commented Aug 3, 2021

Hey @ruler501, thanks for your report.

HostPath volumes are not available in GKE Autopilot, but you can use other volume types, for example, emptyDir (with all consequences it has, eg. losing Agent local state when Pod gets rolled over). Below your slightly modified manifest - I've used emptyDir and set ephemeral-storage.

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: quickstart
spec:
  version: 7.13.4
  elasticsearchRefs:
  - name: quickstart
  daemonSet:
    podTemplate:
      spec:
        containers:
        - name: agent
          resources:
            requests:
              ephemeral-storage: 1Gi
        volumes:
        - emptyDir: {}
          name: agent-data
  config:
    inputs:
      - name: system-1
        revision: 1
        type: system/metrics
        use_output: default
        meta:
          package:
            name: system
            version: 0.9.1
        data_stream:
          namespace: default
        streams:
          - id: system/metrics-system.cpu
            data_stream:
              dataset: system.cpu
              type: metrics
            metricsets:
              - cpu
            cpu.metrics:
              - percentages
              - normalized_percentages
            period: 10s

It results in:

$ k get nodes
NAME                           STATUS   ROLES    AGE   VERSION
gk3-autopilot-cluster-1--...   Ready    <none>   47h   v1.20.8-gke.900
gk3-autopilot-cluster-1--...   Ready    <none>   47h   v1.20.8-gke.900
gk3-autopilot-cluster-1--...   Ready    <none>   47h   v1.20.8-gke.900

$ k get agent
NAME         HEALTH   AVAILABLE   EXPECTED   VERSION   AGE
quickstart   green    3           3          7.13.4    17m

I'll close this issue as such behavior is expected on GKE Autopilot. Please reopen as needed.

@thbkrkr
Copy link
Contributor

thbkrkr commented Aug 3, 2021

Small additional comments:

The agent failed to start because it is trying to mount a hostPath directory in write mode.

It's a known issue that Elastic Agent doesn't work very well on Kubernetes restricted environments like GKE Autopilot (see elastic/beats#19600).

It looks like the Fleet mode that was recently introduced on master doesn't use the same hostPath mount so it is possible that will alleviate the problem, but I cannot verify that currently.

If you want to test Fleet with ECK there is a good news: ECK 1.7.0 has been released today with Fleet support :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants