You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in issues like #18741 we also frequently experience timeouts writing the remote state back to S3 & releasing the DynamoDB locks. The retries described there would be a great improvement, though there are still plenty of ways that Terraform can potentially fail without completing and then writing back the state.
So we've been wondering if instead there are ways that Terraform could work toward an approach that would give stronger consistency than just writing the state back after it has completed all operations.
Attempted Solutions
The retries described in #18741 would be a partial solution. We've considered scripting something similar into our deployment, but haven't implemented it yet.
Proposal
One approach we've thought about is whether Terraform could perform a "write-ahead" update of the operations it's going to apply with enough information to recover from a failure by checking the status of those updates.
Existing resources in the state can often by modified ~idempotently by just fetching the current properties of the resource and then applying the modifications again.
However when creating a new resource, failures to update the state can lead to orphaned resources. For example, if Terraform could update the remote state to say that it's going to create an EC2 instance tagged with a client-generated UUID before it starts the operation, then if something fails it can locate the instance based on that tag.
Current Terraform Version
Use-cases
As described in issues like #18741 we also frequently experience timeouts writing the remote state back to S3 & releasing the DynamoDB locks. The retries described there would be a great improvement, though there are still plenty of ways that Terraform can potentially fail without completing and then writing back the state.
So we've been wondering if instead there are ways that Terraform could work toward an approach that would give stronger consistency than just writing the state back after it has completed all operations.
Attempted Solutions
The retries described in #18741 would be a partial solution. We've considered scripting something similar into our deployment, but haven't implemented it yet.
Proposal
One approach we've thought about is whether Terraform could perform a "write-ahead" update of the operations it's going to apply with enough information to recover from a failure by checking the status of those updates.
Existing resources in the state can often by modified ~idempotently by just fetching the current properties of the resource and then applying the modifications again.
However when creating a new resource, failures to update the state can lead to orphaned resources. For example, if Terraform could update the remote state to say that it's going to create an EC2 instance tagged with a client-generated UUID before it starts the operation, then if something fails it can locate the instance based on that tag.
References
#18741
The text was updated successfully, but these errors were encountered: