Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support generic sidecars and passwordless DB access #629

Merged

Conversation

ed-marks
Copy link
Contributor

@ed-marks ed-marks commented Jan 26, 2024

Primarily to enable GCP's CloudSQLProxy to run as a sidecar

  • Adds sidecars in the form of init containers which is behind a beta feature gate as of k8s 1.29
  • Allows omitting persistence secretRefs for passwordless DB access

✅ Tested basic sidecar functionality
✅ Tested omitting secretRefs
✅ Tested ServiceAccount annotations
✅ Tested CloudSQLProxy Explicitly
✅ Tested passwordless DB access

Addresses #600 - however only for K8s 1.28+ with the SidecarContainers Feature Gate enabled.
Since this primarily affects GKE, that means a 1.29 cluster in reality which they've just released on the Regular release channel in most(all?) regions now

@alexandrevilain
Copy link
Owner

Thanks for the work @ed-marks !
I'm here to help if needed :)

service: postgres
spec:
containers:
- name: postgres

Check warning

Code scanning / SonarCloud

CPU limits should be enforced Medium

Specify a CPU limit for this container. See more on SonarCloud
service: postgres
spec:
containers:
- name: postgres

Check warning

Code scanning / SonarCloud

Memory limits should be enforced Medium

Specify a memory limit for this container. See more on SonarCloud
metadata:
labels:
service: postgres
spec:

Check warning

Code scanning / SonarCloud

Service account tokens should not be mounted in pods Medium

Set automountServiceAccountToken to false for this specification of kind Deployment. See more on SonarCloud
service: postgres
spec:
containers:
- name: postgres

Check warning

Code scanning / SonarCloud

Storage limits should be enforced Medium

Specify a storage limit for this container. See more on SonarCloud
@@ -117,6 +118,13 @@ func (r *TemporalClusterReconciler) reconcilePersistence(ctx context.Context, cl
return 0, fmt.Errorf("can't reconcile schema script configmap: %w", err)
}

// Ensure the serviceaccount used by jobs is up-to-date
serviceAccountBuilder := base.NewServiceAccountBuilder(persistence.ServiceNameSuffix, cluster, r.Scheme, &v1beta1.ServiceSpec{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little hacky, passing in the null ServiceSpec - I'll take a look at making the SA builder a little more generic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SA builder wasn't actually using the ServiceSpec at all so I've just removed it

}
annotations["eks.amazonaws.com/role-arn"] = *b.instance.Spec.Archival.Provider.S3.RoleName
}
if b.instance.Spec.Persistence.DefaultStore.SQL != nil &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also feels like it could be neater

@ed-marks ed-marks marked this pull request as ready for review February 2, 2024 18:00
@ed-marks
Copy link
Contributor Author

ed-marks commented Feb 2, 2024

thanks @alexandrevilain - I think i've covered most bases here. Tricky to add e2e tests to as i couldn't find a kind_with_registry version later than 1.27 though

I'll run some more manual tests next week against some of our dev clusters, but as of cd889 everything functioned with sidecars and GCP CloudSQL as it did locally

api/v1beta1/temporalcluster_types.go Outdated Show resolved Hide resolved
type: boolean
perUnitHistogramBoundaries:
additionalProperties:
jobInitContainers:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add entries here: https://github.com/alexandrevilain/temporal-operator/blob/main/Makefile#L49 ?
They are used to remove pod specs in the CRDs and set the x-kubernetes-preserve-unknown-fields = true field.
This makes the operator's CRD much tinner.

@@ -987,6 +994,9 @@ type TemporalClusterSpec struct {
// JobResources allows set resources for setup/update jobs.
// +optional
JobResources corev1.ResourceRequirements `json:"jobResources,omitempty"`
// JobInitContainers adds a list of init containers to the setup's jobs.
// +optional
JobInitContainers []corev1.Container `json:"jobInitContainers,omitempty"`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for myself: when releasing v1 APIversion, jobs will need their own key.

@alexandrevilain
Copy link
Owner

Hi @ed-marks ! Thanks for your awesome work. Added some minor comments!

Addresses #600 - however only for K8s 1.28+ with the SidecarContainers Feature Gate enabled.

Agree, but it's still providing init containers feature for older k8s versions, which is pretty fine!

Copy link
Owner

@alexandrevilain alexandrevilain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @ed-marks ! Thanks for the work!
Lint and e2e tests are failing, feel free to ping me if you need some help!

@ed-marks
Copy link
Contributor Author

ed-marks commented Feb 7, 2024

perfect thanks for the approve @alexandrevilain - that last commit has fixed all the e2e tests except the cass persistence which afaik isn't affected by the changes in this PR but for some reason i can't seem to make happy. Should have a min to take a proper look on Friday

@ed-marks
Copy link
Contributor Author

ed-marks commented Feb 9, 2024

Ok @alexandrevilain the e2es are all reliably passing locally now so i suspect we should be good to retry those e2e tests again.

The problem with them locally was the namespaces being left over was eating up resources on my laptop and by the time we got to the last one - usually cassandra - the db pod would just get OOMKilled all day. I hope this is whats also happening in that 1.27 test but naturally I can't see into whats happening on the kind cluster to be sure.

I'm not sure if there was a reason the NS deletion stuff was removed before, i couldn't find any context around it - if there is I can delete it again but you might need to run the pipeline on a bigger machine

test.yaml Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Feb 12, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@alexandrevilain alexandrevilain merged commit ab16cac into alexandrevilain:main Feb 13, 2024
9 checks passed
@alexandrevilain
Copy link
Owner

Thanks @ed-marks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants