Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode resource limits into schema #3864

Closed
wants to merge 17 commits into from
Closed

Encode resource limits into schema #3864

wants to merge 17 commits into from

Conversation

zephraph
Copy link
Contributor

@zephraph zephraph commented Aug 9, 2023

This PR aims to encode resource limits like max memory, max vCPU, etc explicitly into the schema so that clients can auto generate validation code for those checks.

As a consequence of taking on this work I had to shuffle around where we were defining these constant limits. Schema definitions for things like vCPU count actually live in common and (most) of the limits where in app. App uses common but not vice versa so it seemed to make sense to move these values over to common. An alternative here could be to create a new top level crate called omicron-limits or something. I chatted with @smklein about these options and we both think the addition to common is a lighter touch.

Comment on lines 9634 to 9638
"description": "The amount of memory to allocate to the instance, in bytes.\n\nMust be between 1 and 256 GiB.",
"type": "integer",
"format": "uint64",
"minimum": 1073741824,
"maximum": 274877906944
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked out nicely! The numbers are a bit hard to read but I think the calculated description is helpful in that regard.

@zephraph zephraph marked this pull request as ready for review August 11, 2023 18:36
nexus-client/src/lib.rs Outdated Show resolved Hide resolved
@zephraph zephraph linked an issue Aug 16, 2023 that may be closed by this pull request
@zephraph zephraph requested review from smklein and ahl January 23, 2024 15:50
Comment on lines +12090 to +12091
"minimum": 1073741824,
"maximum": 1098437885952
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool to see this in the API, I hope this can make our clients better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this actually isn't right. It's not the min/max that's wrong, it's the schema. We still want ByteCount to be preserved but we also want the limits. It took me a little digging to figure out what to do here.

According to the spec two models can be composed into a single object via allOf. So what we actually want here is an allOf with the ByteCount and these limited definitions. My latest commit has something that should produce that, but typify pukes on it here:

https://github.com/oxidecomputer/typify/blob/1f97f167923f001818d461b1286f8a5242abf8b1/typify-impl/src/merge.rs#L690

I plan to look at this more tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... the unimplemented! isn't wrong...

nexus-client/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +9 to +10
pub const MIN_MEMORY_BYTES_PER_INSTANCE: u32 = 1 << 30; // 1 GiB
pub const MAX_MEMORY_BYTES_PER_INSTANCE: u64 = 256 * (1 << 30); // 256 GiB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: For all these "_BYTES" constants, could we use u64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, could we hold off on that until another PR? It has some decently wide ranging repercussions. There are a lot of cases that we have to move from try to try_from. Further there are situations where we're adding two u32 ints together which require them to both be transformed. We can do it, it's just going to touch a lot of places and it'd be nice to not have that all in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can punt

Copy link
Contributor

@ahl ahl Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smklein you and your love of u64s....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose a simple plan, charge $0.005 / hour (USD) effective February 1st for all u32 usage until we migrate to a fully u64 world.

@ahl
Copy link
Contributor

ahl commented Jan 25, 2024

This PR aims to encode resource limits like max memory, max vCPU, etc explicitly into the schema so that clients can auto generate validation code for those checks.

Is that valuable? Why is that valuable? How do you envision the user experience changing / improving for what kinds of (erroneous) operations? It seems that there are other situations where we might kick back a value as being too large e.g. if the number of CPUs indicated is larger than what remains in one's allocation.

In other words, there is often additional validation that is done beyond what's expressed in the schema.

In addition, one might imagine additional limits that vary for different users. For example, customers might have policy limits to say "you can't have a VM larger than 4 vCPUS (i.e. because I'm cruel)" that I can imagine us supporting.

To be clear: I'm ambivalent about this change, and can see the clarity this provides in the SDKs. I also see the potential additional complexity and future stumbling blocks.

@zephraph
Copy link
Contributor Author

Is that valuable? Why is that valuable? How do you envision the user experience changing / improving for what kinds of (erroneous) operations? It seems that there are other situations where we might kick back a value as being too large e.g. if the number of CPUs indicated is larger than what remains in one's allocation.

We hard code upper and lower limits for resources in several places. We should clearly communicate that those limits exist. Having a hard upper boundary doesn't mean that a submission in the acceptable range will always be successful. I don't believe making these constraints explicit will indicate that. There are going to be arbitrary computed constraints that of course shouldn't be in the schema.

I understand the relative value of this is low compared to other on-going work but I think it's sweating the details that makes our product excellent. The goal here is to provide clarity at what I hope is only a marginal complexity cost. This PR can't be merged until there's a PR on typify to enable the schema to be correct here, so I'll move it back to draft for the time being.

@zephraph zephraph marked this pull request as draft January 29, 2024 15:56
@zephraph
Copy link
Contributor Author

I think, eventually, it would be good for someone to tackle something along similar lines. If we have hard code limits, showing them in the schema does seem to make sense. Also, wrangling all the consts to a semi-shared place seems like it would be an improvement.

@zephraph zephraph closed this Feb 23, 2024
@zephraph zephraph deleted the limits-in-schema branch February 23, 2024 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reflect instance resource limits in schema
4 participants