Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create k8s config for new spack-gantry service #826

Merged
merged 9 commits into from
May 22, 2024

Conversation

cmelone
Copy link
Contributor

@cmelone cmelone commented Apr 24, 2024

This is my first pass at deploying the dynamic allocation service into the k8s cluster. Please let me know if I'm missing something important or if you see any glaring issues.

One thing that's missing is the secrets, is there a process for generating the sealed-secrets.yaml file?

We will also need to create an S3 bucket along with creds, docs here: https://litestream.io/guides/s3/

todo:

  • Resource requests/limits for the volumes and containers (storage, cpu, ram)
  • Documentation

closes spack/spack-gantry#7

@alecbcs fyi

@mvandenburgh mvandenburgh self-requested a review April 24, 2024 11:44
@mvandenburgh
Copy link
Member

We will also need to create an S3 bucket along with creds, docs here: litestream.io/guides/s3

We should be able to use a k8s ServiceAccount associated with an IAM role to grant the pod access to the S3 bucket, instead of creating long-lived credentials and encoding them as a secret. I can set up that role/service account + the S3 bucket.

One thing that's missing is the secrets, is there a process for generating the sealed-secrets.yaml file?

I think using a service account instead of storing IAM credentials eliminates the need for this, but let us know if you still need this info and we can get you it.

@cmelone
Copy link
Contributor Author

cmelone commented Apr 25, 2024

We should be able to use a k8s ServiceAccount associated with an IAM role to grant the pod access to the S3 bucket, instead of creating long-lived credentials and encoding them as a secret. I can set up that role/service account + the S3 bucket.

Thanks for setting this up!

I think using a service account instead of storing IAM credentials eliminates the need for this, but let us know if you still need this info and we can get you it.

We need to somehow store the Gitlab API token besides the S3 creds, if secrets are the right way to do this

I updated some of the files in response to your comments, I appreciate the feedback.

One question, I'm seeing this line on some of the deployments:

nodeSelector:
  spack.io/node-pool: base

should I add that to this PR as well?

@mvandenburgh
Copy link
Member

We need to somehow store the Gitlab API token besides the S3 creds, if secrets are the right way to do this

I added that here - 655968a. Let me know if you need a higher scope/more scopes than read_api.

One question, I'm seeing this line on some of the deployments:

nodeSelector:
  spack.io/node-pool: base

should I add that to this PR as well?

Yes, that should be added (that's needed for karpenter to schedule the pods correctly)

@cmelone
Copy link
Contributor Author

cmelone commented Apr 26, 2024

Thanks for adding the API token, that level of access is perfect. Once the webhook is set up, we'll need to also store the webhook secret

@cmelone
Copy link
Contributor Author

cmelone commented May 7, 2024

Hey @mvandenburgh, this is almost ready. Would it be possible to add a webhook for all job status changes? I am not sure what the FQDN will be inside the cluster, but it should be pointed to /v1/collect. Thanks

@mvandenburgh
Copy link
Member

@cmelone I added that webhook to #827. Let me know once this PR is ready, and we can merge both this and #827 at the same time.

@cmelone
Copy link
Contributor Author

cmelone commented May 8, 2024

Thanks! is there a webhook secret that gets set? If not, I'll need to remove that as a required env variable in the app

@mvandenburgh
Copy link
Member

Thanks! is there a webhook secret that gets set? If not, I'll need to remove that as a required env variable in the app

GitLab doesn't provide one, no. And since the service isn't exposed publicly, it is safe to allow unauthenticated requests.

@alecbcs
Copy link
Member

alecbcs commented May 8, 2024

Thanks! is there a webhook secret that gets set? If not, I'll need to remove that as a required env variable in the app

GitLab doesn't provide one, no. And since the service isn't exposed publicly, it is safe to allow unauthenticated requests.

GitLab does support/recommends using a secret token. Eventhough this won't be exposed publicly, I'd vote we should probably use one just in case someone were to get into the network and start feeding bad allocations back to gantry.

When creating a webhook on the GitLab side you provide the secret token as an argument as shown here. We can use any randomly generated string and then pass it in as a secret to the Kubernetes configuration.

image

@mvandenburgh
Copy link
Member

Ah, thanks @alecbcs - I wasn't aware of that. I added the secret token to the webhook and also encoded it as a k8s secret here - 6d151b1

@alecbcs
Copy link
Member

alecbcs commented May 8, 2024

Sweet! Thanks @mvandenburgh :D

@cmelone
Copy link
Contributor Author

cmelone commented May 8, 2024

Thanks! I'll do some last minute checks and mark this as ready.

Once deployed, would it be possible to get CLI access to the pods via kubectl? I imagine that there will be some unexpected issues and it might be easier for me to debug this way. If this doesn't fit with the way you guys access the cluster, no worries

- volumes.configMap does not need a namespace field as it will inherant the pod's namespace
- update subPath for litestream config given location in `terraform/modules/spack/spack_gantry.tf`
@cmelone cmelone marked this pull request as ready for review May 9, 2024 07:00
@cmelone
Copy link
Contributor Author

cmelone commented May 9, 2024

Should be ready for merging now

@mvandenburgh
Copy link
Member

Once deployed, would it be possible to get CLI access to the pods via kubectl? I imagine that there will be some unexpected issues and it might be easier for me to debug this way. If this doesn't fit with the way you guys access the cluster, no worries

Do you have an IAM user on AWS?

@cmelone
Copy link
Contributor Author

cmelone commented May 9, 2024

Do you have an IAM user on AWS?

Not at the moment

@alecbcs
Copy link
Member

alecbcs commented May 21, 2024

I created @cmelone an IAM user and submitted #849 to add him to the cluster access list.

@mvandenburgh mvandenburgh merged commit a511bab into spack:main May 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Containerize and get ready for web
3 participants