-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aws-eks] Upgrading to v1.20.0 #5544
Comments
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created. The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers. The second issue is fixed by adding 3 retries to "kubectl apply". **Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020. - Fixes #4087 - Fixes #4695 - Fixes #5259 - Fixes #5501 --- BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read #5544 carefully for upgrade instructions.
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created. The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers. The second issue is fixed by adding 3 retries to "kubectl apply". **Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020. - Fixes #4087 - Fixes #4695 - Fixes #5259 - Fixes #5501 --- BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read #5544 carefully for upgrade instructions. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Hi @eladb. I understand the reasoning and I understand that this package was marked experimental from the start. Problem is, this kind of upgrade path is really not ideal, we have invested a lot in CDK and now we fear that this kind of solution could be repeated in the future. Can the CDK team guarantee, or at least try to commit to, that this kind of solution will not become the norm for your packages? This is also a clear violation of the semantic versioning system, a minor version upgrade should not introduce breaking changes, especially a huge one like this. I suggest that either you change the way packages are versioned to reflect their changes, or you try to respect the major-minor semantic. Otherwise, as a customer, we really cannot trust this project and will have to migrate away to avoid potentially losing money and time rebuilding our infrastructure each time there is a change in tooling. I'm sorry if I sound harsh or angry, I'm really not but this update has scared us a lot, and management is starting to question our technical choices, which as you can imagine puts me in a really difficult position. Thank you for your understanding |
@MatteoJoliveau thanks for your feedback. We absolutely commit that modules that are marked "stable" will not be broken in minor versions and such migrations will not be required, but unfortunately we can't make this commitment for "experimental" modules like EKS. Since the entire framework uses a single version line (for a myriad of reasons), we are unable to conform to semantic versioning on modules that are still unstable. This is actually not an uncommon practice in this space. Node.js uses the same approach where experimental modules in the node.js API are not bound to semantic versioning. I believe this type of breakage is not going to be common, and we tried hard to make it possible for you to avoid the breakage by using It's a nasty tradeoff between progress and stability I am sure you are familiar with from your work. For example, if EKS was already marked "stable", it means it would have been much harder to implement a robust fix for the issues this change addresses without breaking existing clusters. We understand this could be very painful and apologize if this caused grief with your team. |
Thank you @eladb for your reply. I understand it is not an easy task to maintain such a large and complex ecosystem of packages. We'll chart a plan to upgrade our clusters some way, and will be more cautious with experimental packages in the future. Being reassured that stable packages upgrades are handled more carefully is more than enough for us. |
Have you considered moving "experimental" constructs out of the main library and into a separate package? That would serve three purposes:
As far as #1, just because there is a label in the documentation does not mean that people are expecting large breaking changes on a point release. We tend to think about the entire library as either being in GA or Beta, but not a little bit of both. Having a separate library makes it crystal clear. Although it may add more complication for you to keep track of dependencies, #2 would benefit the customers in that it would allow us to take advantage of improvements to the core library without having to deal with a possible breaking change in an experimental library. This could also work the other way around, where an experimental library can iterate faster than the stable core. I think the advantage of #3 is obvious, and would result in happy consumers of your API. Most importantly, it would give us more confidence and trust in you as providers of a core technology. All of this is meant as constructive advice to help you build a better project. I love the product and I just want it be as good as it can be. |
There were two causes of timeouts for EKS cluster creation: create time which is longer than the AWS Lambda timeout (15min) and lack of retry when applying kubectl after the cluster has been created. The change fixes the first issue by leveraging the custom resource provider framework to implement the cluster resource as an async resource. The custom resource providers are now bundled as nested stacks so they don't take up too many resources from users, and are also reused by multiple clusters within the same stack. This required that the creation role will not be the same as the lambda role, so we define this role separately and assume it within the providers. The second issue is fixed by adding 3 retries to "kubectl apply". **Backwards compatibility**: as described in #5544, since the resource provider handler of `Cluster` and `KubernetesResource` has been changed, this change requires a replacement of existing clusters (deployment fails with "service token cannot be changed" error). Since this can be disruptive to users, this change includes an exact copy of the previous version under a new module called `@aws-cdk/aws-eks-legacy`, which can be used as a drop-in replacement until users decide to upgrade to the new version. Using the legacy cluster will emit a synthesis warning that this module will no longer be released as part of the CDK starting March 1st, 2020. - Fixes #4087 - Fixes #4695 - Fixes #5259 - Fixes #5501 --- BREAKING CHANGE: (in experimental module) the providers behind the AWS EKS module have been rewritten to address multiple stability issues. Since this change requires cluster replacement, the old version of this module is available under `@aws-cdk/aws-eks-legacy`. Please read aws/aws-cdk#5544 carefully for upgrade instructions. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
As described in #5540, version 1.20.0 of the experimental @aws-cdk/aws-eks module includes new implementation for the resource providers behind
Cluster
and theKubernetesResource
in order to address several stability issues.This change requires a replacement of your existing EKS clusters and since this module is experimental, we decided to introduce these breaking changes without backwards compatibility. To alleviate the pain, we will publish the previous version of this module under @aws-cdk/aws-eks-legacy until March 1st, 2020. The legacy module be used as a drop-in replacement in case you wish to plan this migration.
If you will try to update a stack that contains an existing EKS cluster to this new version, you will get an error that the service token of a custom resource cannot be changed.
Unfortunately, this means that you will have to destroy and recreate your cluster in order to use the new aws-eks library. We understand that in production systems this requires intentional planning.
To allow you to migrate at your own time, we have published the old version under @aws-cdk/aws-eks-legacy. If you replace
@aws-cdk/aws-eks
with@aws-cdk/aws-eks-legacy
, you stacks will stay unchanged, as well as your cluster.When you are ready to recreate your cluster, the safest option is to follow these steps:
Alternatively you can also try to modify the logical ID of your cluster resource, so CloudFormation will think this is a new cluster and that the old cluster should be deleted. Bear in mind that this technique cannot be used if your cluster uses a physical name.
The text was updated successfully, but these errors were encountered: