Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stronger durability of remote state #19488

Open
mgood opened this issue Nov 28, 2018 · 0 comments
Open

Stronger durability of remote state #19488

mgood opened this issue Nov 28, 2018 · 0 comments

Comments

@mgood
Copy link
Contributor

mgood commented Nov 28, 2018

Current Terraform Version

0.11.10

Use-cases

As described in issues like #18741 we also frequently experience timeouts writing the remote state back to S3 & releasing the DynamoDB locks. The retries described there would be a great improvement, though there are still plenty of ways that Terraform can potentially fail without completing and then writing back the state.

So we've been wondering if instead there are ways that Terraform could work toward an approach that would give stronger consistency than just writing the state back after it has completed all operations.

Attempted Solutions

The retries described in #18741 would be a partial solution. We've considered scripting something similar into our deployment, but haven't implemented it yet.

Proposal

One approach we've thought about is whether Terraform could perform a "write-ahead" update of the operations it's going to apply with enough information to recover from a failure by checking the status of those updates.

Existing resources in the state can often by modified ~idempotently by just fetching the current properties of the resource and then applying the modifications again.

However when creating a new resource, failures to update the state can lead to orphaned resources. For example, if Terraform could update the remote state to say that it's going to create an EC2 instance tagged with a client-generated UUID before it starts the operation, then if something fails it can locate the instance based on that tag.

References

#18741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants