Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: Add serviceAccounts, rbac, and small fixes #13747

Conversation

jwitko
Copy link
Contributor

@jwitko jwitko commented Feb 3, 2023

Description

  • Add per-service serviceAccounts
  • Add RBAC per service
  • Fix historical commented config druid_segmentCache_locations to match PVC mount path. In it's current form it breaks the historical pod when uncommented due to lack of permissions to create directory.
  • Update README.md with new options
  • Add default annotation to historical service stateful set. Without any annotations the historical service breaks when druid-kubernetes-extensions is enabled.

ATTENTION

This PR should be merged after #13746 since it contains a helm chart version bump incremental to that one.

Goal

The goal of this PR is to bring the apache/druid helm chart up to modern standards and fix some small issues. Enabling per-service serviceAccounts allows for finer grained RBAC which is in general a better security posture. Without this all services are forced to use the default serviceAccount which creates issues when needing to annotate service accounts for things like AWS iRSA as well as makes your only method of controlled permissions using a dedicated namespace.

Release note

Update suggested segment-cache path, Allow for per-service serviceAccounts in druid helm chart and finer-grained RBAC, and add a default annotation to historical statefulset.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jwitko jwitko force-pushed the helm/add-serviceaccounts-rbac-and-small-fixes branch from 4c8e2dc to 544d100 Compare February 3, 2023 05:38
@jwitko jwitko force-pushed the helm/add-serviceaccounts-rbac-and-small-fixes branch from 544d100 to 33818bf Compare February 3, 2023 19:28
@abhishekagarwal87 abhishekagarwal87 added the Helm Chart https://github.com/apache/druid/tree/master/helm/druid label Feb 8, 2023
@abhishekagarwal87
Copy link
Contributor

can you elaborate a bit more on "Enabling per-service serviceAccounts allows for finer grained RBAC"? Maybe take us through an example use-case that you want to implement on your own cluster.

@jwitko
Copy link
Contributor Author

jwitko commented Feb 8, 2023

Merged upstream master into this branch

@jwitko
Copy link
Contributor Author

jwitko commented Feb 8, 2023

can you elaborate a bit more on "Enabling per-service serviceAccounts allows for finer grained RBAC"? Maybe take us through an example use-case that you want to implement on your own cluster.

Absolutely.
In its current state the helm chart does not allow for configuration of service accounts on the deployments/statefulsets. This means your only choice is to use the default service account in whatever namespace you deploy into. Because of this, if you share the namespace with any other applications and they also use the default namespace you are now forced to share any roles/rolebindings that affect the default service account. So if you want to isolate permissions this leaves your only option as putting druid in its own namespace so that nothing else shares the default service account.

Assuming you take the above approach and druid is now in its own namespace you have another issue. Different druid services require different levels of permissions. For example the broker does not require the ability to start kubernetes jobs but the overlord service does if you're using the druid-kubernetes-overlord-extensions extension. With a single service account shared for all services you have no choice but to give all the services the ability to start jobs if you want to use the druid-kubernetes-overlord-extensions which goes against the principal of least privilege. You can extrapolate this even further by bringing the druid-s3-extensions into the conversation. If you want to leverage AWS iRSA for a better security posture and so that you don't have to store AWS credentials in your cluster and find a way to securely get them there you now have to give every single druid process the same IAM privileges in AWS if they are all using the same service account.

By merging this PR we get per-service service accounts which allow us to apply Kubernetes native RBAC and additional external privileges (like iRSA) on a per-service level so that we don't have to give unnecessary privileges to services that don't require them.

@abhishekagarwal87 does that make sense? Let me know if you have any questions or disagree with any of the above.

@abhishekagarwal87
Copy link
Contributor

Thank you. I see that by default, helm charts are now going to create service accounts per service. so can we keep the default off in top-level values.yaml

@jwitko
Copy link
Contributor Author

jwitko commented Feb 8, 2023

Thank you. I see that by default, helm charts are now going to create service accounts per service. so can we keep the default off in top-level values.yaml

Sorry, I'm not sure what you mean here? Keep the default for what off?

If you're talking about keeping the default for creating service accounts off then I would strongly recommend against this. Having a per-app service account is a much better security posture. It is possible it would affect existing helm chart deployments since users have likely had to modify the default service account outside of the helm chart, so maybe instead of leaving it off we bump the minor version and create an upgrade warning/note?

{{- end }}
spec:
{{- if .Values.broker.serviceAccount.create }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blocks external creation of the serviceAccount. Most other charts allow you to bring your own serviceAccount which may also make reusing the same one for all druid service types easier.

I am wondering if we should default serviceAccount.name to unset and if create is true it uses what the default would be. Then if serviceAccount.name is set and serviceAccount.create is false we can still set the value here.

The charts I am used to using have a template like:

{{/*
Create the name of the forwarder service account to use
*/}}
{{- define "fluentd.forwarder.serviceAccountName" -}}
{{- if .Values.forwarder.serviceAccount.create -}}
    {{ default (printf "%s-forwarder" (include "common.names.fullname" .)) .Values.forwarder.serviceAccount.name }}
{{- else -}}
    {{ default "default" .Values.forwarder.serviceAccount.name }}
{{- end -}}
{{- end -}}

and then always set the serviceAccountName with the output of the template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying. I have a bit of an opposite experience from you in terms of what other public widely used charts offer which is why I went this way but I'm not married to it. I can create a helper definition which does the above sort of style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,21 @@
{{- if .Values.rbac.create }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be checking .Values.broker.enabled? Feels like disabling the broker should also disable these resources from being created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,20 @@
{{- if .Values.rbac.create }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about checking enabled for the druid service, eg: .Values.broker.enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,20 @@
{{- if .Values.broker.serviceAccount.create }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about checking enabled for the druid service, eg: .Values.broker.enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +13 to +19
- apiGroups:
- ""
resources:
- pods
- configmaps
verbs:
- '*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear on why the broker needs access to delete Pods, or really any of these. Are these permissions only needed by the Overlord?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that when this role did not exist none of the pods could start. They seem to all manage labels when using the kubernetes extension. Unclear if they will remove when not needed the same as they add when needed. I can test that and narrow down permission sets for each service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An update:

  • Just an FYI that this is the role settings recommended in the official druid documentation for these extensions.
  • Attempting to convert my dev cluster over to this version of the helm chart has been met with some very frustrating complications. For some reason, which as far as I can tell has nothing to do with service accounts, I am not getting the
    2023-02-09T21:56:09,734 INFO [k8s-task-runner-3] org.apache.druid.indexing.overlord.TaskQueue - Received SUCCESS status for task: <some_task> response after a job when running with a cluster deployed via this version of the helm chart.
  • This means a task of {"type":"noop"} is spawning a k8s job, executing successfully, the job has a zero return code and completes successfully, but the job status is reported as FAILED in the UI.
  • The logs for that, in debug mode, look like this:
2023-02-09T21:55:51,559 DEBUG [k8s-task-runner-25] org.apache.http.wire - http-outgoing-22 >> "2023-02-09T21:39:13,653 DEBUG [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92]: UpdateStatusAction{status=successful}[\n]"
2023-02-09T21:55:51,560 DEBUG [k8s-task-runner-25] org.apache.http.wire - http-outgoing-22 >> "2023-02-09T21:39:13,663 DEBUG [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92]: UpdateLocationAction{taskLocation=TaskLocation{host='null', port=-1, tlsPort=-1}}[\n]"
2023-02-09T21:55:51,560 DEBUG [k8s-task-runner-25] org.apache.http.wire - http-outgoing-22 >> "2023-02-09T21:39:13,671 DEBUG [task-runner-0-priority-0] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92] status changed to [SUCCESS].[\n]"
2023-02-09T21:55:51,560 DEBUG [k8s-task-runner-25] org.apache.http.wire - http-outgoing-22 >> "  "id" : "noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92",[\n]"
2023-02-09T21:55:51,561 DEBUG [k8s-task-runner-25] org.apache.http.wire - http-outgoing-22 >> "2023-02-09T21:39:13,737 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92].[\n]"
2023-02-09T21:55:51,561 DEBUG [k8s-task-runner-25] org.apache.http.wire - http-outgoing-22 >> "af-4656-a1e1-9e77ced24d92] status changed to [FAILED].[\n]"

The job status in the UI shows as:

{
  "id": "noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92",
  "groupId": "noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92",
  "type": "noop",
  "createdTime": "2023-02-09T21:39:02.014Z",
  "queueInsertionTime": "1970-01-01T00:00:00.000Z",
  "statusCode": "FAILED",
  "status": "FAILED",
  "runnerStatusCode": "WAITING",
  "duration": -1,
  "location": {
    "host": "10.233.68.10",
    "port": 8100,
    "tlsPort": -1
  },
  "dataSource": "none",
  "errorMsg": "Task failed %s: [ noop_2023-02-09T21:39:02.008Z_936917b3-a0af-4656-a1e1-9e77ced24d92, noop20230209t2..."
}

The logs from the actual job itself show a noop task with success:

2023-02-09T16:53:23,371 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "noop_2023-02-09T16:53:12.037Z_3c8e9f26-6ca4-4600-8723-572d2c8baf48",
  "status" : "SUCCESS",
  "duration" : 3887,
  "errorMsg" : null,
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}

I'm at a complete loss at this point so if anyone wanted to help I'd be all ears and welcome any type of paired-troubleshooting. I've gone as far as to give all the service accounts full cluster admin permissions but again I don't really think this has anything to do with service accounts at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed up the changes requested except for the RBAC changes since I haven't actually been able to say I've had a successful deployment with these chart changes yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is a bug with the kubernetes overlord extension and kubernetes 1.25+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to file an issue for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discovered this with the author of the kubernetes overlord extension @churromorales . I believe #13759 already fixes the issues and relates to #13749 but maybe @churromorales can comment further if another issue is required I will happily make one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dampcake putting aside the potential for a new issue to be created I would ask that we merge the RBAC in its current state since it matches the official documentation. Once the bugs are fixed for K8s 1.25 I will test for the minimum viable role permissions and make a new PR to apply those via the helm chart as well as allow for concatenating or overriding the role permissions via the chart .Values.rbac map so people won't get blocked on changes required for rbac in the future.

If you're good with this let me know and I'll create the issue and it can be assigned to me.

| `broker.enabled` | enable broker | `true` |
| `broker.name` | broker component name | `broker` |
| `broker.replicaCount` | broker node replicas (deployment) | `1` |
| `broker.port` | port of broker component | `8082` |
| `broker.serviceAccount.create` | Create a service account for broker service | `true` |
| `broker.serviceAccount.name` | Service account name | `true` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation looks off to me. Sorry I didn't catch this before. Looks like it says the default for everything is true but that's not actually the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No apologies, this is my fault. Sorry I missed it during pre-commit review. Fixed the documentation and merged upstream/master again.

Copy link
Contributor

@dampcake dampcake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this!

@@ -193,7 +233,7 @@ historical:
druid_processing_numMergeBuffers: 2
druid_processing_numThreads: 1
# druid_monitoring_monitors: '["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]'
# druid_segmentCache_locations: '[{"path":"/var/druid/segment-cache","maxSize":300000000000}]'
# druid_segmentCache_locations: '[{"path":"/opt/druid/var/druid/segment-cache","maxSize":300000000000}]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the bullet point in the description

Fix historical commented config druid_segmentCache_locations to match PVC mount path. In it's current form it breaks the historical pod when uncommented due to lack of permissions to create directory.

The container did not have permissions to create/write to /var/druid/segment-cache. /opt/druid/var/driuid/segment-cache is inside of the defined PVC mounted volume path and so the application would have the ability to write there as well it will be captured by the persistent volume and persist through pod restarts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, thank you for the fix.

@@ -199,6 +225,11 @@ The following table lists the configurable parameters of the Druid chart and the
| `router.replicaCount` | router node replicas (deployment) | `1` |
| `router.port` | port of router component | `8888` |
| `router.serviceType` | service type for service | `ClusterIP` |
| `router.serviceAccount.create` | Create a service account for router service | `true` |
| `router.serviceAccount.name` | Service account name | `` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the default service account name generated by helm? we should put that here instead of an empty string as default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default name is template derived so there is no way to put a value here that is guaranteed to be accurate. It's a combination of the druid.<service>.fullname template which is based off the druid.fullname template and the service name. Most of which can be overridden.

{{- define "druid.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{- define "druid.broker.fullname" -}}
{{ template "druid.fullname" . }}-{{ .Values.broker.name }}
{{- end -}}
{{/*
Create the name of the broker service account
*/}}
{{- define "druid.broker.serviceAccountName" -}}
  {{- if .Values.broker.serviceAccount.create }}
    {{- default (include "druid.broker.fullname" .) .Values.broker.serviceAccount.name }}
  {{- else }}
    {{- default "default" .Values.broker.serviceAccount.name }}
  {{- end }}
{{- end }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put Derived from the name of service. or something to that effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed.

name: {{ template "druid.historical.fullname" . }}
subjects:
- kind: ServiceAccount
name: {{ .Values.historical.serviceAccount.name }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be druid.historical.serviceAccountName? The default value of this property is empty. is the chart generating the k8s configs correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this. This was a left-over from before @dampcake recommended switching to the empty default value above in this PR and my testing had already had the .Values.*.serviceAccount.name values set from previous as well so I wasn't getting an issue here.

Fixed now.

@abhishekagarwal87
Copy link
Contributor

@jwitko - can you fix the static checks failures? LGTM otherwise.

@jwitko jwitko force-pushed the helm/add-serviceaccounts-rbac-and-small-fixes branch 2 times, most recently from 5c3471f to 6583615 Compare February 22, 2023 18:39
@jwitko
Copy link
Contributor Author

jwitko commented Feb 22, 2023

@jwitko - can you fix the static checks failures? LGTM otherwise.

I accidentally rebased again forgetting that you guys prefer a merge instead. trying to fix that now.

@jwitko
Copy link
Contributor Author

jwitko commented Feb 22, 2023

@abhishekagarwal87 my apologies but I don't think I'm able to get the history back in a meaningful way to github. If you want me to close this PR and just open a new one then please let me know. Otherwise the PR looks to be in solid shape now with tests passing.

@jwitko jwitko mentioned this pull request Feb 22, 2023
1 task
@jwitko
Copy link
Contributor Author

jwitko commented Feb 22, 2023

@abhishekagarwal87 The static checks are failing for a spelling error that is outside the scope of this PR:

> spellcheck
> mdspell --en-us --ignore-numbers --report '../docs/**/*.md' || (./script/notify-spellcheck-issues && false)

    ../docs/tutorials/tutorial-jupyter-index.md
       41 | er both try to use port `8888,` so start Jupyter on a different port 

>> 1 spelling error found in 227 files

They seem to have passed on all things related to this chart once I added the license everywhere.

It seems to be complaining about this line but I'm not really sure what its even upset about

@abhishekagarwal87 abhishekagarwal87 merged commit f7a5fcf into apache:master Feb 23, 2023
@abhishekagarwal87
Copy link
Contributor

thanks @jwitko. Thats an unrelated failure.

@jwitko jwitko deleted the helm/add-serviceaccounts-rbac-and-small-fixes branch February 23, 2023 12:09
@clintropolis clintropolis added this to the 26.0 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Helm Chart https://github.com/apache/druid/tree/master/helm/druid
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants