Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] how do you actually use envFrom with a secret? #2134

Closed
jesumyip opened this issue Aug 20, 2024 · 13 comments · Fixed by #2135
Closed

[QUESTION] how do you actually use envFrom with a secret? #2134

jesumyip opened this issue Aug 20, 2024 · 13 comments · Fixed by #2135
Labels
kind/bug Something isn't working question Further information is requested

Comments

@jesumyip
Copy link

jesumyip commented Aug 20, 2024

I've tried

  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "768m"
    envFrom:
      - secretRef:
          name: mysecrets

and when I run a kubectl describe pod on the driver, i don't see those env vars being picked up.

mysecrets is an opaque type secret.

To test whether the spark operator webhook is working, I tried switching the YAML config to:

  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "768m"
    env:
      - name: MY_VAR
        value: "some random value"

and that works just fine.

Am I doing this wrongly? I am using version 1.4.6 of the Helm Chart.

@jesumyip jesumyip added the question Further information is requested label Aug 20, 2024
@ChenYi015
Copy link
Contributor

@jesumyip Could you provide detailed information about how to install the helm chart?

@ChenYi015
Copy link
Contributor

You can try out the latest version if you'd like, as this new version has fixed many problems related to webhook.

@jesumyip
Copy link
Author

jesumyip commented Aug 20, 2024

Hi @ChenYi015

I have tried the latest version you provided.

  • I use this values file.
spark:
  jobNamespaces:
    - ""
controller:
  logLevel: "debug"
webhook:
  logLevel: "debug"

spark:
  serviceAccount:
    create: true
    name: spark-sa
  • Everything created with no errors. 2 pods are running - one for spark-operator-controller and one for spark-operator-webhook

  • I then created a SparkApplication

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: test-hosts
  namespace: xxx
spec:
  type: Python
  mode: cluster
  image: "<image hosted in private repo>"
  imagePullPolicy: Always
  imagePullSecrets: 
    - docker-json
  mainApplicationFile: "local:///opt/bitnami/spark/universal_parser.py"
  sparkVersion: "3.5.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  deps:
    packages:
      - org.apache.spark:spark-streaming-kafka-0-10_2.12:3.5.0
      - org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "768m"
    envFrom:
      - secretRef:
          name: parser-secrets
    labels:
      version: 3.5.1
    serviceAccount: spark-sa
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "1024m"
    labels:
      version: 3.5.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

And I waited about 1minute but still no pod created in the namespace xxx. I checked the logs for the operator and webhook pods and nothing new - only the logs that were created when the 2 pods started up.'

>> kubectl get sparkapplication

NAME                STATUS   ATTEMPTS   START   FINISH   AGE
test-hosts                                        9m52s

@jesumyip
Copy link
Author

jesumyip commented Aug 20, 2024

Is there some permissions that is incorrectly set? But I don't see any errors logged in the 2 pods in the spark-operator namespace...

operator pod logs

++ id -u
+ uid=0
++ id -g
+ gid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator controller start --zap-log-level=debug --namespaces=default --controller-threads=10 --enable-ui-service=true --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-controller-lock --leader-election-lock-namespace=spark-operator
Spark Operator Version: v2.0.0-rc.0+unknown
Build Date: 2024-08-12T02:57:44+00:00
Git Commit ID: 
Git Tree State: clean
Go Version: go1.22.5
Compiler: gc
Platform: linux/amd64
2024-08-20T14:32:27.118Z        INFO    controller/start.go:251 Starting manager
2024-08-20T14:32:27.119Z        INFO    controller-runtime.metrics      server/server.go:205    Starting metrics server
2024-08-20T14:32:27.119Z        INFO    manager/server.go:50    starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-08-20T14:32:27.119Z        INFO    controller-runtime.metrics      server/server.go:244    Serving metrics server  {"bindAddress": ":8080", "secure": false}
I0820 14:32:27.119306      10 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-controller-lock...
I0820 14:32:27.136595      10 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-controller-lock
2024-08-20T14:32:27.136Z        DEBUG   events  recorder/recorder.go:104        spark-operator-controller-5f7497d6f5-9lxl4_ea1b7250-f6fd-42ec-9bbc-debb1a803c58 became leader     {"type": "Normal", "object": {"kind":"Lease","namespace":"spark-operator","name":"spark-operator-controller-lock","uid":"ef251560-cdef-4b4f-9080-ec9a4eecab1f","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5067755"}, "reason": "LeaderElection"}
2024-08-20T14:32:27.136Z        INFO    controller/controller.go:178    Starting EventSource    {"controller": "spark-application-controller", "source": "kind source: *v1.Pod"}
2024-08-20T14:32:27.136Z        INFO    controller/controller.go:178    Starting EventSource    {"controller": "scheduled-spark-application-controller", "source": "kind source: *v1beta2.ScheduledSparkApplication"}
2024-08-20T14:32:27.136Z        INFO    controller/controller.go:178    Starting EventSource    {"controller": "spark-application-controller", "source": "kind source: *v1beta2.SparkApplication"}
2024-08-20T14:32:27.136Z        INFO    controller/controller.go:186    Starting Controller     {"controller": "scheduled-spark-application-controller"}
2024-08-20T14:32:27.136Z        INFO    controller/controller.go:186    Starting Controller     {"controller": "spark-application-controller"}
2024-08-20T14:32:27.237Z        INFO    controller/controller.go:220    Starting workers        {"controller": "spark-application-controller", "worker count": 10}
2024-08-20T14:32:27.237Z        INFO    controller/controller.go:220    Starting workers        {"controller": "scheduled-spark-application-controller", "worker count": 10}

webhook pod

++ id -u
+ uid=0
++ id -g
+ gid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=debug --namespaces=default --webhook-secret-name=spark-operator-webhook-certs --webhook-secret-namespace=spark-operator --webhook-svc-name=spark-operator-webhook-svc --webhook-svc-namespace=spark-operator --webhook-port=9443 --mutating-webhook-name=spark-operator-webhook --validating-webhook-name=spark-operator-webhook --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-webhook-lock --leader-election-lock-namespace=spark-operator
Spark Operator Version: v2.0.0-rc.0+unknown
Build Date: 2024-08-12T02:57:44+00:00
Git Commit ID: 
Git Tree State: clean
Go Version: go1.22.5
Compiler: gc
Platform: linux/amd64
2024-08-20T14:32:27.297Z        INFO    webhook/start.go:243    Syncing webhook secret  {"name": "spark-operator-webhook-certs", "namespace": "spark-operator"}
2024-08-20T14:32:27.772Z        INFO    webhook/start.go:257    Writing certificates    {"path": "/etc/k8s-webhook-server/serving-certs", "certificate name": "tls.crt", "key name": "tls.key"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.builder      builder/webhook.go:158  Registering a mutating webhook  {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.webhook      webhook/server.go:183   Registering webhook     {"path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.builder      builder/webhook.go:189  Registering a validating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.webhook      webhook/server.go:183   Registering webhook     {"path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.builder      builder/webhook.go:158  Registering a mutating webhook  {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.webhook      webhook/server.go:183   Registering webhook     {"path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.builder      builder/webhook.go:189  Registering a validating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.webhook      webhook/server.go:183   Registering webhook     {"path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.builder      builder/webhook.go:158  Registering a mutating webhook  {"GVK": "/v1, Kind=Pod", "path": "/mutate--v1-pod"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.webhook      webhook/server.go:183   Registering webhook     {"path": "/mutate--v1-pod"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.builder      builder/webhook.go:204  skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called       {"GVK": "/v1, Kind=Pod"}
2024-08-20T14:32:27.773Z        INFO    webhook/start.go:319    Starting manager
2024-08-20T14:32:27.773Z        INFO    controller-runtime.metrics      server/server.go:205    Starting metrics server
2024-08-20T14:32:27.773Z        INFO    manager/server.go:50    starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-08-20T14:32:27.773Z        INFO    controller-runtime.webhook      webhook/server.go:191   Starting webhook server
2024-08-20T14:32:27.774Z        INFO    controller-runtime.metrics      server/server.go:244    Serving metrics server  {"bindAddress": ":8080", "secure": false}
2024-08-20T14:32:27.774Z        INFO    webhook/start.go:357    disabling http/2
2024-08-20T14:32:27.774Z        DEBUG   controller-runtime.healthz      healthz/healthz.go:60   healthz check failed    {"checker": "readyz", "error": "webhook server has not been started yet"}
2024-08-20T14:32:27.774Z        INFO    controller-runtime.healthz      healthz/healthz.go:128  healthz check failed    {"statuses": [{}]}
I0820 14:32:27.774433      10 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-webhook-lock...
2024-08-20T14:32:27.774Z        INFO    controller-runtime.certwatcher  certwatcher/certwatcher.go:161  Updated current TLS certificate
2024-08-20T14:32:27.774Z        INFO    controller-runtime.webhook      webhook/server.go:242   Serving webhook server  {"host": "", "port": 9443}
2024-08-20T14:32:27.774Z        INFO    controller-runtime.certwatcher  certwatcher/certwatcher.go:115  Starting certificate watcher
I0820 14:32:27.791240      10 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-webhook-lock
2024-08-20T14:32:27.791Z        INFO    controller/controller.go:178    Starting EventSource    {"controller": "validating-webhook-configuration-controller", "source": "kind source: *v1.ValidatingWebhookConfiguration"}
2024-08-20T14:32:27.791Z        INFO    controller/controller.go:178    Starting EventSource    {"controller": "mutating-webhook-configuration-controller", "source": "kind source: *v1.MutatingWebhookConfiguration"}
2024-08-20T14:32:27.791Z        INFO    controller/controller.go:186    Starting Controller     {"controller": "validating-webhook-configuration-controller"}
2024-08-20T14:32:27.791Z        INFO    controller/controller.go:186    Starting Controller     {"controller": "mutating-webhook-configuration-controller"}
2024-08-20T14:32:27.791Z        DEBUG   events  recorder/recorder.go:104        spark-operator-webhook-75d88ff76d-549nw_aab28de5-4e4d-49ca-931c-c319031dbdba became leader        {"type": "Normal", "object": {"kind":"Lease","namespace":"spark-operator","name":"spark-operator-webhook-lock","uid":"29e67682-4868-46a9-a954-592b2ad0d6cb","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5067773"}, "reason": "LeaderElection"}
2024-08-20T14:32:27.892Z        INFO    validatingwebhookconfiguration/event_handler.go:46      ValidatingWebhookConfiguration created    {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.892Z        INFO    controller/controller.go:220    Starting workers        {"controller": "validating-webhook-configuration-controller", "worker count": 1}
2024-08-20T14:32:27.892Z        INFO    controller/controller.go:220    Starting workers        {"controller": "mutating-webhook-configuration-controller", "worker count": 1}
2024-08-20T14:32:27.892Z        INFO    mutatingwebhookconfiguration/event_handler.go:46        MutatingWebhookConfiguration created      {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.897Z        INFO    mutatingwebhookconfiguration/controller.go:72   Updating CA bundle of MutatingWebhookConfiguration        {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.897Z        INFO    validatingwebhookconfiguration/controller.go:73 Updating CA bundle of ValidatingWebhookConfiguration      {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.907Z        INFO    mutatingwebhookconfiguration/event_handler.go:68        MutatingWebhookConfiguration updated      {"name": "spark-operator-webhook", "namespace": ""}
2024-08-20T14:32:27.912Z        INFO    validatingwebhookconfiguration/event_handler.go:68      ValidatingWebhookConfiguration updated    {"name": "spark-operator-webhook", "namespace": ""}
2024-08-20T14:32:27.917Z        INFO    mutatingwebhookconfiguration/controller.go:72   Updating CA bundle of MutatingWebhookConfiguration        {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.917Z        INFO    validatingwebhookconfiguration/controller.go:73 Updating CA bundle of ValidatingWebhookConfiguration      {"name": "spark-operator-webhook"}

@jesumyip
Copy link
Author

jesumyip commented Aug 20, 2024

I also tried this values file where I modified the spark job namespaces

spark:
  jobNamespaces:
    - "xxx"

I notice in the webhook pod the startup parameter is still shown as

+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=debug --namespaces=default....

Is this the reason no SparkApplication gets created because of --namespaces=default ?

@ChenYi015
Copy link
Contributor

I also tried this values file where I modified the spark job namespaces

spark:
  jobNamespaces:
    - "xxx"

I notice in the webhook pod the startup parameter is still shown as

+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=debug --namespaces=default....

Is this the reason no SparkApplication gets created because of --namespaces=default ?

I have just tried to set spark.jobNamespacers to [test]:

helm install spark-operator spark-operator/spark-operator \
    --version 2.0.0-rc.0 \
    --create-namespace \
    --namespace spark-operator \
    --set 'spark.jobNamespaces={test}'

and the webhook pods logs shown that namespaces were correctly set:

+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=info --namespaces=test --webhook-secret-name=spark-operator-webhook-certs --webhook-secret-namespace=spark-operator --webhook-svc-name=spark-operator-webhook-svc --webhook-svc-namespace=spark-operator --webhook-port=9443 --mutating-webhook-name=spark-operator-webhook --validating-webhook-name=spark-operator-webhook --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-webhook-lock --leader-election-lock-namespace=spark-operator

@ChenYi015
Copy link
Contributor

spark:
jobNamespaces:
- ""
controller:
logLevel: "debug"
webhook:
logLevel: "debug"

spark:
serviceAccount:
create: true
name: spark-sa

@jesumyip There is an issue related to cache settings when setting spark.jobNamespaces to all namespaces(""), and this will be fixed in PR #2123 and #2128. So you need to set job namespaces to specific namespaces instead of [""].

@jesumyip
Copy link
Author

jesumyip commented Aug 20, 2024

looks like the helm chart isn't compatible with kustomize. i used kustomize to install it and the namespace for the webhook isn't picked up correctly. it still gets shown as --namespaces=default.

kustomize build . --enable-helm > output.yaml

shows this:

image

interestingly enough when i modify the helm chart @ line 54 of webhook/deployment.yaml to become

        {{- with .Values.duh.fish }}
        - --namespaces={{ . | join "," }}
        {{- end }}

and i set my values file to:

duh:
  fish:
    - "xxx"
    - "test"

then the output is correct. i actually see

        - --namespaces=xxx,test

The value of default seems to be picked up from the included values.yaml file in the helm chart. I cannot seem to override it with my own values file.

@jesumyip
Copy link
Author

jesumyip commented Aug 20, 2024

@ChenYi015 Now when I try installing it with

helm install spark-operator spark-operator/spark-operator \
    --version 2.0.0-rc.0 \
    --create-namespace \
    --namespace spark-operator \
    --set 'spark.jobNamespaces={test,xxx}' \
    --set 'spark.serviceAccount.name=spark-sa' \
    --set 'spark.serviceAccount.create=true'

I can see the startup parameter for the webhook becomes --namespaces=test,xxx which is expected.

But when I apply the SparkApplication I can only see a svc being created in namespace test. There is no pod. There are also no additional logs in the controller and webhook pods. In the driver pod logs, I can see this:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/bladerunner/pods/xxx-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "xxx" is forbidden: User "system:serviceaccount:test:spark-sa" cannot get resource "pods" in API group "" in the namespace "test": RBAC: role.rbac.authorization.k8s.io "spark-sa" not found.

Now if I then reinstall the helm chart with

helm install spark-operator spark-operator/spark-operator \
    --version 2.0.0-rc.0 \
    --create-namespace \
    --namespace spark-operator \
    --set 'spark.jobNamespaces={test,xxx}' \

and I have to change the service account in my SparkApplication yaml to <helmchart-releasename>-spark then the driver pod is created properly. I can also see that the driver pod has the envFrom applied correctly.

@ChenYi015
Copy link
Contributor

@jesumyip Thanks for reporting the issue, the spark rolebinding template did not render properly when setting spark.serviceAccount.name. I will fix it in the next release.

@jesumyip
Copy link
Author

@ChenYi015 also have a look at that strangeness with spark.jobNamespaces behaviour. I cannot seem to override the value provided in the default values.yaml file.

@jesumyip
Copy link
Author

@ChenYi015 Nevermind. I found out the problem with the spark.jobNamespaces behaviour. It was my mistake. My values file had two spark: sections.

@ChenYi015
Copy link
Contributor

/kind bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working question Further information is requested
Projects
None yet
2 participants