-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip named nodes in the CNR that don't exist #83
Skip named nodes in the CNR that don't exist #83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've significantly altered the behaviour of the software, but haven't added tests for the new behaviour. I would expect to see some tests added that have nodes added to the initial request that are then terminated, and the behaviour validated on that. I'm a little uneasy releasing this functionality without this test.
Additionally, there may be other people running Cyclops who are relying on the existing behaviour. I can see two options here:
- Cut a new major version of Cyclops (i.e. backwards incompatible changes)
- Add a configurable setting for this behaviour and default it to the old behaviour
foundNode = true | ||
// Skips any named node that does not exist in the node group for this CycleNodeRequest. | ||
func (t *CycleNodeRequestTransitioner) addNamedNodesToTerminate(kubeNodes map[string]corev1.Node, nodeGroupInstances map[string]cloudprovider.Instance) { | ||
kubeNodesMap := make(map[string]corev1.Node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: name this after the function of the variable. Something like nodeLookupByName
. I kept trying to find it being used for something more than a lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, there may be other people running Cyclops who are relying on the existing behaviour.
Good point, I initially intended for this to be a breaking change but now that I think about it making it configurable may be the better approach to cater to different use-cases.
👍 I'll add some tests to validate the intended behaviour.
kubeNodes, err := t.listReadyNodes(true) | ||
if err != nil { | ||
return t.transitionToHealing(err) | ||
} | ||
if len(kubeNodes) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're losing the ability to detect user error here. Is there a way we can keep user error detection, while also not failing at nonexistent nodes?
One strategy could be: if our selector did match some nodes, but none were still in the cluster, we can call it a success. If it matches no nodes, we don't call it a success. The only problem with this approach is nodegroups scaled to zero. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see how that's possible unless we have a way of knowing if a node name in the CNR matches a node that existed prior to the CNR being created.
A flag here would help to toggle strict node validation and enable both use cases but I'm wary of adding more settings to a CNR.
@@ -101,6 +101,13 @@ spec: | |||
- Drain | |||
- Wait | |||
type: string | |||
strictValidation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So by default if not set this will be false
?
Added a new StrictValidation option to the CNR to keep backwards compatible functionality.
I guess this does keep backwards compatible functionality, but is technically a breaking change or a behaviour change - we'll need to document this in the release notes.
We'll also need to ensure cluster owners update the CRDs to include this new field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the intention, I would like this new functionality to be the default rather than the exception because it reduces the number of times a cycle will fail by default.
However, happy to change it if we deem it's better to set the flag to true
by default.
// StrictValidation is a boolean which determines whether named nodes selected in a CNR must | ||
// exist and be valid nodes before cycling can begin. If set to true when invalid nodes are | ||
// selected the CNR will be transitioned to the "Failed" phase before cycling can begin again. | ||
StrictValidation bool `json:"strictValidation,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever see StrictValidation
being used for other things?
At the moment the description of this field suggests its only for this one thing - when invalid nodes are found, transition to failed. But the naming of the field suggests its to generically enable/disable strict validation.
Naming is hard, but I'd probably suggest renaming the flag to something that describes specifically what it is enabling/disabling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also make sense to put this field into CycleSettings
?
Doing this means it's available in a CycleNodeStatus object - do we need it here?
Should we move it elsewhere within CycleNodeRequest to make it not propagate down?
Could we have a "ValidationOptions" field within CycleNodeRequest for holding other validation/configuration options a user might want to configure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see your point. I initially intended for more validations to be affected by PR but then narrowed the scope to just checking for named nodes. Having said that, having ValidationOptions
in the CNR would be better because we can then have more granular settings. 🤔
With this PR, we could make it something like, ValidationOptions.SkipMissingNamedNodes
. The existing functionality would remain the same and adding the flag would lead to using this new functionality.
@@ -44,6 +44,8 @@ spec: | |||
of an object. Servers should convert recognized schemas to the latest | |||
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' | |||
type: string | |||
clusterName: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this field come from? Why is it only in the CycleNodeRequest
resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was part of the operator-sdk render, make generate-crds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - nice work! Love the validationOptions
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CNR
if they don't exist through avalidationOptions.skipNamedNodes
option. As a result if none of the named nodes exist theCNR
will succeed immediately when it transitions to theInitialized
phase.ValidationOptions
option added to theCNR
andNodegroup
objects.Pending
phase and supporting functions.