No late initialize for container NodePool's node_count and initial_node_count #600
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of your changes
In the gcp terraform provider there are two ways to configure the size of NodePools:
You cannot specify both when creating a node pool, but this is not (currently) validated by the TF provider when updating an existing node pool.
At present provider-gcp late initializes both properties, resulting in a spec with both values set. Not only would this not be valid for creating a new node pool, but in the case where the user explicitly set only the initial size the late initialization causes the provider to attempt to manage the ongoing size of the node pool as if the user had set node_count too, resulting in fights with the GKE cluster autoscaler, etc.
While this can be worked around by disabling LateInitialize in the managementPolicy, the current behaviour is undesirable and surprising to users as it effectively removes the distinction between the two ways to specify pool size.
The TF docs also warn that the initial_node_count reported can change if node pools are resized, and suggests using a lifecycle block to ignore changes.
This PR removes both node_count and intial_node_count from late initialization, and ignores changes to the initial_node_count field.
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.How has this code been tested
Note: The TF provider calculates current node_count as
int(instances / AZs)
, so when testing it's necessary to resize the node pool by at least the number of AZs or the provider will not detect the change!