-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scheduling): add volume group capacity tracking #21
Conversation
48285d7
to
ef941f5
Compare
Codecov Report
@@ Coverage Diff @@
## master #21 +/- ##
=========================================
- Coverage 1.20% 1.08% -0.12%
=========================================
Files 11 11
Lines 831 920 +89
=========================================
Hits 10 10
- Misses 821 910 +89
Continue to review full report at Codecov.
|
Gentle reminder @pawanpraka1 @akhilerm |
4bdd2cf
to
263467b
Compare
# to generate the CRD definition | ||
|
||
--- | ||
apiVersion: apiextensions.k8s.io/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can start using v1
CRDs version ..v1beta
going to be deprecated in k8s 1.22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had initially configured the same, then saw that other crds (LVMSnapshot, LVMVolume etc) are using v1beta1. I think, we can take this up as part of separate pull request as it'll required changes to other crds as well.
kind: CustomResourceDefinition | ||
metadata: | ||
annotations: | ||
controller-gen.kubebuilder.io/version: v0.2.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can start using the controller-gen verison v0.4.0 version which will basically generates the v1
CRD version by default.
878542c
to
e401866
Compare
Signed-off-by: Yashpal Choudhary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some of the changes like CRDs version etc going to be fixed in upcoming PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iyashu why do we need watcher for lvmnode object? does node daemonset need to take any action as per modification/updation of lvmnode object?
Signed-off-by: Yashpal Choudhary <[email protected]>
Signed-off-by: Yashpal Choudhary <[email protected]>
Why is this PR required? What issue does it fix?:
With k8s version >= 1.19, there is feature(alpha stage) added where the kube-scheduler gets influenced by storage capacity available on nodes during filtering stage. Without CSIStorageCapacity tracking, a pod with a volume that uses delayed binding may get scheduled multiple times, but might always land on the same node unless there are multiple nodes with equal priority. More details are mentioned in KEP 1472 & external provisioner documentation.
What this PR does?:
We introduced a new custom resource called
LVMNode
recording all the available vgs and its corresponding attributes in each node. Openebs node plugin running on each node will periodically scan vgs and reconcileLVMNode
resource. At controller side, we've implemented CSI GetCapacity method called by k8s external-provisioner for creating/updating CSIStorageCapacity objects which in turn used by kube-scheduler for each node topology. K8s external-provisioner calls GetCapacity for every combination of storage class and node topology segments (individual nodes in our case). So, volume group configured in storage class is also a part of args in GetCapacity method. In case, there are multiple volume group (with multi pool support) available on nodes, we choose the volume group having maximum free capacity & its name matching to volgroup parameter of the GetCapacity call.Does this PR require any upgrade changes?:
No
If the changes in this PR are manually verified, list down the scenarios covered::
Consider a kubernetes cluster having 5 worker nodes (besides master ones) each having a storage capacity of size 32 Gi. Create a stateful set cluster say
sts-a
of size 4 each requesting storage capacity(pvc) of size 20Gi. Now create another stateful set cluster saysts-b
of size 1 again requesting storage capacity(pvc) of size 20Gi. We'll see that pod insts-b
will be scheduled (by kube-scheduler) on right node having enough capacity.Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs
Checklist:
<type>(<scope>): <subject>