-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QCL turnup add profile options based of linked earth #2332
Conversation
5cf491d
to
7947658
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Pris!
I hope its okay that I reviewed this now before a review is explicitly requested, I'll be quite busy on monday with other maintenance efforts and wanted to offload myself from doing it then.
I list what I overview as the remaining work for this PR below:
- Small as the default profile list entry
- Both highcpu-32 and highcpu-96 have profile list entries with requests ensuring user pods get dedicated nodes
- display_name and description of the highcpu-32 and highcpu-96 entries
I think it would be good if we manage to reserve size adjective for the standardized choices of small/medium/large corresponding to highmem-4/highmem-16/highmem-64. Here is an idea of what we could have in display_name and description without a size adjective.- display_name: "n2-highcpu-32: 32 CPU / 32 GB RAM" description: "Start a container on a dedicated node"
- If you look to include image choices in this PR, I think the two choices mentioned in [Request deployment] New Hub: QCL aka QuantifiedCarbon #2118 (comment) should be included, and that we provide
pangeo/pangeo-notebook:latest
to represent the wish about using a prebuild "2i2c docker image". I opened Intervention to have new hubs not couple to the 2i2c-hubs-image, but something more up to date #2336 to motivate why a new community should do that instead of usingquay.io/2i2c/2i2c-hubs-image
. - Because of the default value in kubespawner to set
imagePullPolicy
toIfNotPresent
, I think now that we referencelatest
, we should make thisAlways
viasingleuser.image.pullPolicy
. (Related comment about this upstream: Kubespawner image_pull_policy problem jupyterhub/kubespawner#679 (comment)) - Update the requests etc as seen in linked-earth in openscapes/linked-earth: fix node sharing to fit on nodes perfectly #2338. Note it should be linked-earth (GCP, GKE, n2-highmem) specifically and not openscapes (AWS, EKS, r5.) as the allocatable memory is a bit different.
41061b7
to
4d753bb
Compare
@consideRatio could you share the script as I can't quite follow the rounding and math you are using in #2338 |
@pnasrat I put it here https://gist.github.com/consideRatio/071110916cb58220657398c61c14af7c. Its based on having I've done some manual transformations from KiB units to GiB and GB units as well, so the source for the logic is k8s reported allocatable memory for a node of each type in X KiB. |
@consideRatio re your point about changing pull policy that sounds like something that should be done outside of the single cluster and in basehub, is that correct? I don't see any other hubs setting this yet. |
Update highcpu profile entries Use pangeo image
About I have seen a binderhubf or example run into rate limitations by hub.docker.com for example though, so it can be relevant to not have Here is an existing exception for carbonplan: infrastructure/config/clusters/carbonplan/common.values.yaml Lines 36 to 41 in de1b067
I think it makes sense to opt-in for this rather than rely on a default value change in the basehub helm chart. Ideally in my mind, this should not be set at all in kubespawner, and then the k8s default behavior of acting as if its "Always" if the tag is |
Not a downside I am merely ensuring that I would be addressing in the correct place! |
Ah great reading https://cloud.google.com/kubernetes-engine/docs/concepts/plan-node-sizes makes this clearer. |
I read that too!! Then i calculated based on that and couldt observe that the allocatable memory from the described 128GB memory for the n2-highmem-16 node for example. At that point i gave up and looked at the actually allocatable memory as reported by k8s for pods on the node. |
@consideRatio please take another look. I've done most things but will double check the allocatable once the profile list is up to both validate my understanding and understand the process you used to get the numbers |
mem_guarantee: 86.662G | ||
cpu_guarantee: 9.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you are consistent with it the requests here on nodes that users isn't sharing also, sure!
Since we specify a nodeSelector to spawn on a certain node, and we wish to provide users with dedicated nodes (1:1 pod:node), we can make any request as long as one user, but not two, fits on the node.
Oops copy paste faile Co-authored-by: Erik Sundell <[email protected]>
Co-authored-by: Erik Sundell <[email protected]>
Co-authored-by: Erik Sundell <[email protected]>
Co-authored-by: Erik Sundell <[email protected]>
Co-authored-by: Erik Sundell <[email protected]>
Btw if you get multiple code suggestions you want to commit, you have the choice to combine them into one via the GitHub UI, but only from the "Files changed" tab! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a todo note still in the PR, but I figure you can resolve it how you like either before or after merge!
My procedure here would be to deploy this PR manually to qcl staging, test it startup of highcpu-32 and 96, then if needed adjust the mem requests, remove the todo note from the PR, merge.
This approval applies even if highcpu-32 / 96 mem requests are adjusted if needed!
Thanks for the review/approval. I will likely do this with the manual staging deploy first thing my time tomorrow! |
Manual deploy to QC: staging
|
allocatable memory 93.33GB |
|
Add QCL allowed_teams to spawner config Rightsize highcpu memory guarantee based on running nodes
for more information, see https://pre-commit.ci
These numbers seem to work it's possible I could get it down to MiB accuracy but I want to get this merged. Tested by deploying and spawning both highcpu profiles. |
See #2287