-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetCapacityResponse
should contain "total capacity"
#301
Comments
But |
This GetCapacity response is used for determining how much capacity is available for provisioning. It's not for reporting how much capacity is used/available in a single volume. Some more detail on the motivation for being able to report a total capacity. If a plugin only reports current available capacity, that adds limitations on the performance of the controller using that information:
|
Thanks for providing additional context @msau42. It's still not clear to me how including the "total" capacity helps resolve these things. If the CO can't reason about "available" capacity due to parallel operations then it's not obvious to me how having the "total" capacity helps: the CO is still in the same position re: being unable to reason about operations executing in parallel or that may/may-not have completed after a plugin restart. Given that storage provisioning/quota policy parameters could likely be the governance of the backend storage system itself (and invisible to the CO) I think that relying on "stable" cached values for "total" capacity is probably fraught w/ error for some set of backends. I suppose the same could be said of "available" capacity - caching this value for very long might not be a very good idea. |
With total capacity reported, the CO can keep track of what volumes it has created and what is outstanding. Plugin restarts are fine because that base number doesn't change, and the rest of the information can be persisted and reconstructed as needed. However, with only available capacity, we can't tell how many of the volumes we know about are accounted for in the reported capacity. This does have the limitation that the total capacity reported is completely allocated to the CO cluster and not shared with other clusters or allocated out of band. |
Let me try to convey the difficulty with an example.
As a CO, when I periodically query the plugin for available capacity, how do I know which operations have been accounted for in the number that the plugin gives me? There is a timing delay where the CO's view can be out of sync with the plugin's view. If there was a total capacity field, then it doesn't matter what the plugin's view is of 2) or 4). The CO can calculate available capacity based only on its view of the allocated volumes and operations in flight. When it queries the plugin for capacity, a change in total capacity means something like 3) occurred, which is also not as frequent of an event as 2) or 4) Let me know if this makes any more sense. |
Thanks for the example, your use case is much more clear now. I think using
total_capacity in the way that you want will break down for cases where
storage backends can provision different flavors of volumes (depending on
create params) such that those volumes will consume "raw" storage capacity
in different ratios. For example, if an LVM2 VG is a storage backend and
supports carving both both linear and RAID1 volumes, then total_capacity
changes depending on the params passed to GetCapacity. So if there are two
storage classes that consume storage in different ratios from the same
backend (and the CO doesn't have the formulas to calculate how
total_capacity will be affected) then what's the value of reporting
total_capacity? In other words, linear volume creation consumes storage
such that it would affect the total_capacity reported by the
GetCapacity(params=RAID1-flavor) and vice-versa.
…On Tue, Nov 6, 2018 at 8:10 PM Michelle Au ***@***.***> wrote:
Let me try to convey the difficulty with an example.
1. When the plugin is being initialized, we can query it for the
available capacity, and it returns 100 GB out of 500.
2. Say there are some 50 volume creation operations all in flight
3. At the same time an administrator decides to add more available
capacity to the storage backend.
4. Also at the same time, volumes are getting deleted and their
capacity will be added back to the pool.
As a CO, when I periodically query the plugin for available capacity, how
do I know which operations have been accounted for in the number that the
plugin gives me? There is a timing delay where the CO's view can be out of
sync with the plugin's view.
If there was a total capacity field, then it doesn't matter what the
plugin's view is of 2) or 4). The CO can calculate available capacity based
only on its view of the allocated volumes and operations in flight. When it
queries the plugin for capacity, a change in total capacity means something
like 3) occurred, which is also not as frequent of an event as 2) or 4)
Let me know if this makes any more sense.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#301 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPVLB--ca7WgxCpN2QcJjPmE-VcZiKJks5usjLrgaJpZM4YRQwo>
.
|
Agree with jdef. Storage class parameters will become richer over time, and it will be the job of the backend to optimally map volume requests to available "generalized capacity" - by which I mean not just storage capacity, but IOPs, network bandwidth, and many other constrained resources. Trying to report capacity as a single value isn't going to be meaningful. |
The question is whether the capacity can be cached by the CO for volume scheduling or not. From my understanding, there are two things here:
To calculate the available capacity, we need to know the way how the storage (e.g. VG, ceph image pool) is consumed (e.g. linear, raid1 or filesystems with a fixed ratio of the filesystem to block size). If there are multiple ways to consume the backend storage for one storage class. The If there is only one way to allocate the volume from a storage class, then we can report the capacity for this storage class. The CO can cache them to make scheduling decisions. I think this is the only way to do volume scheduling, otherwise, the capacity cannot be calculated and used by the CO. The CO can only assume all nodes have enough capacity, which is the status of current volume scheduling in Kubernetes. The storage driver or plugin can support carving multiple types of volumes (e.g. linear and raid1 volumes in LVM), but for each storage class, it should support only one allocation type.
Described by @msau42 here. Without total capacity per topology segment per storage class (for local storage, each node is a topology segment), the scheduler may make bad decisions when its state is out of sync of the storage backends. It is possible to recover by rescheduling. However, the scheduler may not choose the best-fit node when its state is out of sync (e.g. the storage of best-fit node is occupied by terminating PVs). As in 1) I clarified if we have available capacity per topology segment per storage class, there is only one allocation way for this storage class. We can calculate the total capacity in most cases. For linear volumes, it's easy. For raid1 volumes, LVM will allocate some space for metadata and the ratio compared to total space is not fixed. For these kinds of volumes, we can reserve some space for metadata e.g. To report total capacity, there are two ways:
I suggest adding I think the For each storage class, if we adhere to one allocation way in one storage class, we can cache the reported available capacity for storage classes in CO to do dynamic volume provisioning. If the driver has a way to report total capacity, the experience will be better. |
Right, in Mesos we cache this result for brief periods, and requery every CSI plugin instance, for every StorageClass (we call them profiles), every couple of seconds (10s or 30s, I can't recall) to remain reasonably up-to-date.
This makes sense: a StorageClass definition is transformed into
It sounds like the design you are proposing assumes the following limitation: available capacity for a given StorageClass cannot change at runtime other than through creating or deleting a volume of that StorageClass. This is problematic.
2.1 It can live in the CSI plugin. In Mesos we chose to delegate that knowledge to the CSI plugin: we do no capacity calculation and instead reissue the GetCapacity RPC for every StorageClass, for every CSI plugin instance, at regular intervals. 2.2 It can live in the CO. The CO can encode knowledge of how Create/DeleteVolume influences the You can try and sidestep the issue by restricting every instance of a CSI plugin (e.g., every LVM Volume Group) to a single StorageClass. In that case, you will still make incorrect calculations of I imagine that Create/DeleteVolume alone are also not enough to model correctly: Create/DeleteSnapshot and the soon-to-be-introduced volume resize functionality also have unexpected impact on capacity. It is still possible to encode all this wisdom into the CO, but it would have to be done for every kind of CSI plugin, which defeats some of the purpose of the CSI specification. Another issue with the CO calculating I think the proper solution must be to issue GetCapacity calls and to accept the reality that the CO does not have a perfect view of the amount of available capacity at any given time, but devise strategies to reduce that delta. One such strategy is to periodically poll CSI plugins for Such a strategy could be tempered by only requesting available capacity if some capacity-changing RPC like Create/Delete/ResizeVolume or Create/DeleteSnapshot has been performed against that CSI plugin instance since available capacity was last requested. This has the disadvantage of CO state becoming outdated in the case where the CSI plugin instance's backing storage increases/decreases out-of-band, such as when the administrator extends the LVM VG or adds more disks to a Ceph installation, etc. Perhaps that's OK. |
GetCapacityResponse
should contain "total capacity" in addition toavailable_capacity
so that caller can make decisions about provisioning.The text was updated successfully, but these errors were encountered: