-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion around changes made to the aws_s3_bucket
resource in AWS Provider version 4.0
#23106
Comments
Agree emphatically, this change seems very poorly thought out from the Hashicorp & terraform customer perspective - literally everyone who uses s3 buckets, which is literally everyone who uses AWS, just had their terraform configurations break, out from under their feet. Without any period of deprecation warning. boo. Please change it back? Perhaps if it really needs to go, it could be deprecated for the duration of 4.x.x and removed in 5.x.x? |
Consider using |
@FernandoMiguel pardon my ignorance but would It seems like it's for a 1:1 move of a single resource, this change is going to involve going from a single resource to multiple resources to represent that same s3 bucket. It'd be nice if there were some guidelines for migrating to the new resource pattern. If we have to use |
@YakDriver I was actually afraid that was the answer here. The problem for us we have over 100 buckets, which will end up being a ton of re-writing and importing. The way our org works, we ask everyone to apply things via terraform in ci/cd for the most part, meaning many can't import. I recognize and agree that we shouldn't expect this project to cater to how our org works, just providing that as an example for how big of a problem this will be for many. I would also expect many in largish orgs wouldn't necessarily be able to import. EDIT: for the record, if |
Hey @YakDriver, the issue is not related to being unable to upgrade a single target per those instructions. The issue is the scale of how many thousands times we would need to follow those instructions in literally hundreds of directories. |
The change is asking folks to make massive changes to their Terraform resources with absolutely no notice. I can see that being needed on some pieces within the Terraform AWS provider, however the s3 bucket resource is one of the most commonly used resources and this change requires substantial rework and migration plans. It is not something that many groups can just say "okay we're doing this overnight", even in hyper-agile groups. The provider should have been set to "depreciate" over the 4.x.y release cycle the legacy resource layout to allow organizations time to evaluate and design a plan to move forward. Instead, a substantial amount of Terraform users will version lock to 3.74.1 and never upgrade because the s3 code is going to require an outlandish amount of work with no backwards compatibility (so you can't go back to 3.x.y) nor no guarantee the provider won't be restructured again sans notice. Hashicorp's own marketing statements differ from the reality here. The first line of this post is true: this is one of the oldest and most used pieces of the provider. It however, is completely not depreciating old resources, it completely obliterates the support with no notice and with limited discussion. https://www.hashicorp.com/blog/terraform-aws-provider-4-0-refactors-s3-bucket-resource |
Sure it may be understood that this change may have been exposed by way of GH Issues/PRs/discussion, but I would guess a large portion of the Terraform user base doesn't spend a tonne of time in GH -- and interestingly, there was no notice at runtime about an incoming breaking change (perhaps a different issue altogether). That said, Just to add on to what @adam-bartlett-sp mentioned regarding the 4.0 blog post, it specifically says this on the official blog post:
/shrug |
Hey y'all, Thank you for taking the time to raise this issue, and for everyone who has reacted to it so far, or linked to it from other issues. It's important for us to hear this kind of feedback, and the cross-linking that several folks have done really helps to collect all of this information in one place. There are some great points being made here, and we wanted to take time to address each of them and to provide more information around some of the decisions that were made in regards to the S3 bucket refactor. We recognize that this is a major change to a very popular resource, and acknowledge that a change this large was bound to cause headaches for some. Ultimately, we hope that the information provided here will help give a better idea of why we felt it was necessary to do so. A bit of backgroundThe This monolithic approach has been noted as potentially problematic by maintainers and practitioners alike for quite some time, with a number of issues being raised going back to at least 2017 (#4418, #1704, #6188, #749; all of which were referenced in our planning for this change). In addition to these types of requests, we've seen numerous bugs over time that were either a direct result of or exacerbated by the former structure of the resource; for example #17791. The idea of separating the resource into several distinct/modularized resources also provides the opportunity to more quickly adapt to changes and features introduced to S3, with the added benefit of easier testing to help ensure that bugs are fewer less disruptive. With this information in mind, we knew that changes were needed in order to ensure the long-term viability and resilience of managing the lifecycle of S3 buckets within the provider, and to address the desires of the community at large. Paths we consideredIt's important that we note that the path that we landed on for the 4.0 release was not the only option that we considered. In the interest of transparency, we'd like to discuss some of the other options that we considered, and why we ultimately decided against them. Alternative Option 1: Remove a subset of S3 bucket resource arguments and create new S3 bucket related resources We considered the more aggressive approach of removing a subset of the root-level arguments and introducing additional S3 bucket related resources to replace them. This would have forced practitioners to refactor their configurations immediately before proceeding. Due to the drastic impact this change would have, it was ultimately decided that this was not the right approach. Alternative Option 2: Move existing S3 bucket resource to a new resource and refactor S3 bucket resource This was an approach that was mentioned here already as a suggested alternate approach, and was something that we considered while planning for this major release. This option would imply that the existing schema would be maintained, but moved to a new resource in the provider, e.g. Going this route would mean that practitioners would have needed to either adopt the new Adopted Option: Deprecate subset of S3 Bucket resource arguments, marking them as Computed (read-only) until the next major milestone, and create new S3 bucket related resources The approach we decided was the least impactful of the options given the downsides of the approaches outlined above, and after discussing the options with a number of practitioners, including large enterprise customers with significant Terraform adoption. Much like alternative option 2, and as has been noted here, this does still introduce the need to import resources and make adjustments to configurations long-term, but was intended to allow as much functionality to be retained as was possible while still introducing the necessary refactor. This approach also allowed us an opportunity that option 2 did not: the ability to leave the "deprecated" attributes in place until version 5.0.0, allowing for a longer deprecation period. With the above in mind, we want to take the opportunity to address the use of "deprecate" here. We noted in the pinned issue detailing the changes:
This was not reflected in the blog post about the release (something we're working to address), and we recognize that this doesn't necessarily reflect what "deprecated" means in the software world. The thought here is that this would not break configurations, but rather that there would be no drift detection for computed attributes. We will update the blog post to reflect the language in our pinned issue and clear up any inconsistencies. Migration ToolingAnother consideration that we took into account whilst working on this, that has been mentioned here as well, is the idea of tooling to help aid the transition between the versions. In speaking with users/practitioners while trying to determine the "correct" course of action, the general feedback we received was that the thought of using some type of script to modify large state files, particularly those stored in Terraform Enterprise or Terraform Cloud, was an undesirable path. Manipulating state outside of Terraform has inherent risks associated with it, including loss of tracked resources, which may lead to loss of money, time, or even compromises to the security of infrastructure. In addition, migration is a two part problem; the state and the configuration. This multi-part problem would have necessitated a significantly more complex set of tooling, leading to even greater possibilities for errors. Due to the significant risk associated with it, this ultimately led to the decision to not include migration tooling. Given the feedback we’ve received so far from the community after the release, we’re beginning to discuss whether or not this is a position that we should reconsider. It’s too early in those discussions to say what the outcome will be, but we feel it’s important to note that we are not shutting down the idea based solely on our preconceived notions. We are planning to spend some time next week investigating new approaches to migration tooling and will follow up with the results of our investigation. Moving forwardWe recognize that this was a big change and there are, understandably, questions about moving forward. The first thing we need to address is the request to roll these changes back. We feel strongly that despite the disruptive nature of a refactor as large as this, it was a necessary change given the positives outlined above. While this is the first time that we've made such a substantial change to such a pivotal resource, there are other resources that have similar levels of bloat ( In the immediate future, if this change is disruptive, we would strongly recommend that you take the opportunity to pin the provider version to a version prior to 4.0.0 until you're able to take the actions necessary to make the transition. This will allow for functionality to remain as is while you take the time to do any refactoring and importing that may be necessary with this major change. Lessons for usWe don't believe it would be fair to place all of the burden of the feelings represented here on those making the comments, so it's important for us to openly reflect on ways we could have done better by the community. We'd like to take the opportunity to do so here, and also to open the floor for any additional suggestions that anyone may have. It's never our goal to disrupt anyone's workflow, no matter how important the change is. Advanced NotificationPerhaps the most clear feedback we've seen so far is around the communication of these breaking changes. There were a number of things that we did before and just after the release to try to communicate some of these things; for example, we laid out plans for this release in an issue that has been pinned in the repository since August 2021, released a blog post detailing the changes, posted on Discuss to try to circulate those details, released an upgrade guide to try to help with the transition, and updated several Learn guides (Host a Static Website with S3 and Cloudflare and Build and Use a Local Module) with updated configurations. Given the feedback that we've received so far, it seems that there may be opportunities to enhance these efforts. Because there is not a mailing list for the provider, if there is a place outside of GitHub or Discuss that the community would like to receive notifications at, we'd love to hear about it. So far, we've begun discussing a few potential options that we'd be interested in receiving feedback on. These potential options include posting to Reddit, hosting pre-release community office hours with our Developer Advocates, posting a notice on the documentation in the Terraform Registry, or perhaps even a pre-release blog post much like we did when we deprecated support for 0.11 in providers. |
@justinretzolk first of all, thank you for the summary and feedback on the design decisions. The actual decision to decompose the
|
I have a proposal. |
The concern is probably less about the abstract burden of feelings than it is about the very concrete sudden breaking changes. @joe-a-t 's suggestion of a new resource |
Thanks for surfacing this again. We're working on updates to the upgrade guide and announcement blog post. That part of the post will be updated to reflect how the resource behavior has changed (configurable -> read only/computed) and how that can impact configurations. I'll follow up here with a link to the new content, when it has been updated. |
First and foremost, thanks for the detailed update @justinretzolk -- it certainly helps us all understand the situation better. I am curious if there may be a better way to address the migration path/refactor recommendations from the provider side. In our particular situation, we're looking at several thousand While I understand there may be some added complexity to this soltion, but would it not have been better for the vast number of Terraform users to support both s3 bucket resource configurations and introduce their dedicated resource types alongside them, effectively allowing graceful migration during the v4 provider version like the blog post advised? One particular example is the s3 bucket replication configuration resource -- previously configured at the bucket resource level, there was the addition of a dedicated resource (eg. to support 2-way replication). More importantly, perhaps something was missed regarding the use of the "read-only" attribute? As reported in several other GH issues, but the TF plan/apply errors out when the configuration values themselves haven't changed (actual vs. state) in contrary to the idea of it being a "read-only" attribute. Should it not just throw up a read-only/deprecation warning rather than error out altogether? Perhaps this is something worth looking into as an emergency breaking "bug" rather than just telling people "pin to v3.x and begin refactoring work before switching to v4.x"? |
@justinretzolk reiterating what @tr-mdaize said. Best option in current situation would be to release 4.x version of Bigger issue are all the third party modules that use s3 resource. From consumer side you either have to fork modules or wait for new version of all modules that you use to support 4.x. Obviously none of thirds party modules were ready for this change. From module developer point of view one might have to maintain two versions of module (to support users on 4.x and 3.x). |
It seems to me that the imports are not needed, but maybe it is only in some cases. I have been able to update many to the new configuration without any imports; it just adds the new resources. Maybe I'm causing issues, but I haven't lost any data, and things seem to be okay. |
@theherk I was about to comment exactly this. I feel a bit reckless disobeying the recommendation from the migration guide. Quoting: It is then recommended running terraform import on each new resource to prevent data loss. But, what are the basis for this? Naive terraform applies are working just fine both for S3 bucket ACLs and ECS cluster capacity providers! 🤷♀️ |
@theherk did your plans note any changes when you did this? I'd assume a lot of drop then applies which might be fine for something's but not others. I'm also surprised that the apply didn't fail because the configuration item created by the resource should have already existed. If not, that actually might be a different bug - apply should fail when you try to create things that exist but aren't tracked in state. |
Yes, it noted that the new resources were added. In my case: acl, versioning, encryption, and cors rules in varying combinations. But it didn't disable those to make it happen. It just created the resources in the state. I didn't notice any issues as all. I wish there were some specificity to what drives the recommendation. There must be some condition that triggers data loss, or I'm just fearful about nothing. |
I could see data loss if it is wants to recreate the bucket or turn off and back on versioning - can't remember for sure but it might delete all current old file versions of you turn it off. Agree the scenarios for data loss would be nice to know because if a lot of things don't require import, this becomes much easier to deal with. |
I don't think you can disable versioning, but you could suspend it. Even then though, it isn't a data loss scenario, because no objects are removed. I'm pretty sure unless the bucket goes away configuration changes should not trigger any data loss. Maybe these recommendations are just risk averse avoidance of some state conditions that could happen, but probably don't. I figure, if I just read the plan and verify, it should be fine. 🤞 ymmv |
While @maryelizbeth works to update the blog post with additional information around expected behavior, I wanted to come back and touch on the rationale behind not introducing a new resource (for the sake of continuity with the previous comments, we'll call it Going this route would mean that we would have two resources -
With that in mind, we would find ourselves in a similar position but later in time, and with the added complexity of an additional resource that may or may not have been adopted by many. This leaves the question around why the new resource would need to be renamed. In the past few months, the provider went through another major overhaul, albeit this one a bit more transparent to everyday users. This change involved aligning the repository with service-level packages that line up with the AWS Go SDK (additional details may be found in #20431 and #20000). Part of this refactoring effort involved standardizing on naming for resources, data source, and functions. Creating a new resource of I hope this provides a bit more clarity as to why we decided against that route. Additional information around the current expected behavior of the resource will be contained in the aforementioned blog post update. If there are any additional lingering questions or feedback in the meantime, please continue to comment and react to this issue and the comments within. |
@justinretzolk What about having If you really wanted to stick to the naming convention for resources moving forwards, you could also look at having an upgrade tool that switches all current usages of I totally get the desire to avoid confusion for project maintainers by doing a one-off, but a small bit of additional work for the provider maintainers has to be balanced against the thousands of hours of toil the change is creating for thousands of users across the community. |
For reference, hashicorp/terraform#3116 is the Terraform issue for allowing parameterized lifecycle rules |
Wow, managing an s3 bucket resource is a complex task, so instead of handling the complexity within the provider, the complexities of the configuration are pushed off on everyone who has to work with the resource. BRILLIANT!!! |
I have would like to convert below terraform script to v4.0.0 resource "aws_s3_bucket" "bucket" { |
Hi all, I’m an author of some third-party tools for Terraform and currently working on a refactoring tool for Terraform. The initial goal of this project is to provide a way for bulk refactoring of the aws_s3_bucket resource required by breaking changes in AWS provider v4. Even though the tfedit still has many limitations, it’s clearly better than nothing and ready to use. If you are still struggling with the AWS v4 upgrade, give it a try. For details, see README. |
@justinretzolk The way this release was handled raises serious concerns regarding the lack of Engineering best practices in the AWS Provider team at HashiCorp.. This was a major change, however, seem to have been really poorly tested given the trivial bugs that were reported in 3.x (see one of these bugs below). We basically had to upgrade to provider 4.x directly as 3.x is basically unusable with regard to S3 buckets. There was no immediate urgency behind this change and therefore this release should have been properly regression tested at least covering 90% of the basic S3 functionality. Some of the trivial bug reports raised serious questions what level of regression testing the AWS Provider team at Hashicorp is performing. The real question is if this change would have been needed if Hashicorp actually did some basic regression testing. |
I don’t think that’s entirely fair. The change was poorly thought through, this is true, but when the community brought this up they reacted well and backported the changes to allow a migration path to version 4. It worked flawlessly for us on several hundred buckets. I guess there is an issue with the back port, but the lifecycle rules fix is simple enough and once you’re on 4.x the issue is no longer there. Splitting the resources out like this is definitely cleaner and has made our modules a lot more readable, with a lot less “dynamic” blocks to emulate the separate resources. |
The requirement for a lifecycle rule is nothing more than a quite serious regression bug. It prevents changes to server_side_encryption_configuration, not an unreasonable requirement. It raises genuine concerns that Hashicorp hasn't thought this change through properly and isn't doing sufficient level of regression testing to capture such mistakes. Being more cynical I am not sure they responded to the community, rather than to their paying customers who must have been rightly furious that backwards compatibility was broken. As a free community user, I can only leave constructive feedback here on Github. As a paying customer I would have summoned my Account Manager for a serious talking to. |
Regarding importing:
This is a pretty big downside. Consider the case where I am just refactoring the resources to use the new separate resources, and want to verify that this doesn't result in any changes. I now have to either manually review every one of these new resources (and I may have hundreds of buckets that need to be reviewed), or import all of the new resources (which must be done one at a time, because there isn't a bulk import feature ). Both of these options are pretty tedious. |
Hi @tmccombs, good point! The tfedit (a.k.a. the unofficial AWS provider v4 upgrade tool) supports not only rewriting https://github.com/minamijoyo/tfedit |
In the end it's not really clear if you have to do a refactor or not, and if yes then what is the recommended way to do so. The doc says that the S3 Bucket Refactor "only applies to v4.0.0 through v4.8.0", but the warning is still there as of v4.39.0. |
We have drift CI jobs to check terraform drift as below :
I upgraded provider from 3.56 to 3.76.1 then 4.20.0 on prod and stage , only prod fails to run the job plan as above. |
Hi everyone 👋 We’d like to thank everyone again for your feedback in regards to the refactoring of the With that said, given the time that has passed, and with the upcoming release of version 5 of the provider, we feel that it’s time to unpin and close this issue. We welcome feedback on the announcement post, and are looking forward to some bringing some exciting, helpful changes to the provider. |
Hey @justinretzolk, I didn't see anything related to S3 bucket changes in #29842. Does that mean that S3 bucket resources will be unchanged from the current behavior that v4 has in the upcoming v5 release? Or are you intending to do the depreciation/changes that had previously been envisioned for the v3 to v4 upgrade as part of the v4 to v5 upgrade and the announcement just doesn't have those details included in the draft yet? |
Hey @joe-a-t 👋 Thanks for the follow up question here! For the AWS provider v5 release, we've decided against removing the deprecated arguments for the S3 bucket resource, as we imagine there's still quite a number of folks who have yet to make the transition. At the time of writing, we plan on removing those deprecated arguments in v6 instead. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Update from HashiCorp, 2022-02-25
Hi everyone,
In light of how much traction this issue has gained, and the amount of information contained within, @joe-a-t has graciously allowed us to make edits to the description of this GitHub Issue to better reflect its current state, and so that we may pin it to make it more visible. We would like to sincerely thank him for the opportunity to do so. For the sake of transparency, and to not lose any context, the original issue description and title will be retained beneath this update.
Breakdown
On 2022-02-10, the AWS Provider team at HashiCorp released a new major version of the AWS Provider for Terraform (
v4.0.0
), which included breaking changes to theaws_s3_bucket
resource. The update introduced several independentaws_s3_bucket_*
resources to manipulate the configuration of S3 buckets, where these configuration options previously were arguments to theaws_s3_bucket
resource itself. This issue spawned out of that change, has received quite a bit of valuable feedback, and is being used as a centralized meeting point for ongoing discussions around these changes.Information around these changes may be found:
Action Items
Given the amount of feedback that we received after the release, the AWS Provider team has been following this issue (as well as any other avenues of feedback we can find) and taking steps to try to alleviate some of the burden placed on operators with this release. The team continues to investigate additional avenues, and will continue to update this issue with additional information where possible. These efforts include:
Completed
Currently Investigating
Tooling to help with configuration migration
We are currently investigating potential tooling options to help operators migrate from legacy
aws_s3_bucket
resources to the newaws_s3_bucket_*
resources. We have identified potential existing tooling and have reached out to the owner of the tool to try to help coordinate efforts to make the tool robust enough to share publicly. This is still in the early phases, so the information we can share on it is relatively limited. Despite this, we feel it's appropriate to share that we are looking into it, given the large amount of community interest.Marking deprecated arguments as optional in AWS Provider
v4.x
This change aims at splitting the difference between the behavior in
v3.x
andv4.x
by marking deprecated arguments as optional. The goal here is to allow operators more time to transition before the deprecated arguments are fully removed inv5.x
. Optional arguments would display deprecation warnings, but would not prevent Terraform from running, as was the behavior with the initialv4.x
releases. Documentation will be written in order to provide as much clarity as possible around how to cope with these changes within configurations.Backporting
aws_s3_bucket_*
resources to AWS Providerv3.x
This change aims to bring the new
aws_s3_bucket_*
resources into thev3.x
series of releases in order to give operators more time to plan and execute migration to the new resources without needing to make the jump tov4.x
. This would allow operators to useaws_s3_bucket
resources alongside the newaws_s3_bucket_*
resources, configuring buckets and updating configurations to the new standards as they see fit, without worry of deprecation notices or failed Terraform operationsGiven that backporting resources in this way falls outside of our normal processes, we would like to be explicit around expectations. Bugs in distinct
aws_s3_bucket_*
resources will be backported to ensure functionality, but additional features will not be backported. Documentation will be updated as well to ensure that this expectation is clearly communicated.Original issue information
Title: Change the S3 bucket refactor to be a new resource instead of modifying the existing one
Description
Do not change the existing
aws_s3_bucket
resource in v4 and instead provide a newaws_s3_minimal_bucket
or similar resource that people can use if they want your changes to split out various settings into stand alone resources.My company has literally thousands of AWS buckets configured through Terraform from hundreds of different root modules. Having to run separate
terraform import
commands for all of the settings you split out into separate resources per your upgrade instructions in https://registry.terraform.io/providers/hashicorp/aws/latest/docs/guides/version-4-upgrade#s3-bucket-refactor is quite frankly an insane demand to make for users.Right now, our only options appear to be:
terraform import
commands in hundreds of Terraform directories.Please reconsider this change immediately or at least provide tooling to ease adoption.
New or Affected Resource(s)
Potential Terraform Configuration
References
Update 2/22/2022
FYI, I (@joe-a-t) chatted with HashiCorp separately and gave them permission to update the issue (including the title and this comment) if they would like the issue to more accurately reflect the work that they are planning on doing.
The text was updated successfully, but these errors were encountered: