-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terraform: make sure that the tfstate is re-produced in case we end up with empty tfstate #97
Conversation
…p with an empty tfstate due to operations interrupted before completion and update of the tfstate Signed-off-by: Muvaffak Onus <[email protected]>
Signed-off-by: Muvaffak Onus <[email protected]>
Signed-off-by: Muvaffak Onus <[email protected]>
@ulucinar This is ready for review. I'll write my results from the 24h experiment. |
I've kept it up for more than 2 days and had it log that there has been cases where the code I added detected that there is no attributes array in the tfstate file so it filled it. But the running binary didn't have code to dump the state to give more insight into what was happening at that time but it did just fix the problem and they never went into non-ready state. What I'll do now is to restart the experiment but have some code there to dump the workspace and CR to look for clues about why tfstate gets emptied and when exactly. > kg managed
NAME CHART VERSION SYNCED READY STATE REVISION DESCRIPTION AGE
release.helm.crossplane.io/muvaftesting-dnk2s-s94js kube-prometheus-stack 34.5.1 True True deployed 1 Install complete 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
route.ec2.aws.upbound.io/muvaftesting-dnk2s-g8nqk True True rtb-0bb91aab90f56b028_0.0.0.0/0 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
vpc.ec2.aws.upbound.io/muvaftesting-dnk2s-jskp4 True True vpc-02c920ee7ab279aa0 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
subnet.ec2.aws.upbound.io/muvaftesting-dnk2s-88ntc True True subnet-060aa81e09c1b582b 2d2h
subnet.ec2.aws.upbound.io/muvaftesting-dnk2s-bjwq7 True True subnet-04cd365c618c34aba 2d2h
subnet.ec2.aws.upbound.io/muvaftesting-dnk2s-vlqj5 True True subnet-06220d3c796b068f2 2d2h
subnet.ec2.aws.upbound.io/muvaftesting-dnk2s-wfjk4 True True subnet-06e263e9a12eda14a 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
securitygroup.ec2.aws.upbound.io/muvaftesting-dnk2s-z4ktf True True sg-0febf893a1c788895 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
routetableassociation.ec2.aws.upbound.io/muvaftesting-dnk2s-2zl8p True True rtbassoc-0aae79df71e57d105 2d2h
routetableassociation.ec2.aws.upbound.io/muvaftesting-dnk2s-jh6ct True True rtbassoc-0c505dd4320266b84 2d2h
routetableassociation.ec2.aws.upbound.io/muvaftesting-dnk2s-rtrmt True True rtbassoc-0e18b97a66b30e692 2d2h
routetableassociation.ec2.aws.upbound.io/muvaftesting-dnk2s-stzqb True True rtbassoc-06b48562933c95313 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
mainroutetableassociation.ec2.aws.upbound.io/muvaftesting-dnk2s-zcvfd True True rtbassoc-0485a981e75303e1a 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
internetgateway.ec2.aws.upbound.io/muvaftesting-dnk2s-csqgb True True igw-0c54e729bda4f2a02 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
routetable.ec2.aws.upbound.io/muvaftesting-dnk2s-5vm6w True True rtb-0bb91aab90f56b028 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
securitygrouprule.ec2.aws.upbound.io/muvaftesting-dnk2s-7dmb9 True True sgrule-1169461496 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
nodegroup.eks.aws.upbound.io/muvaftesting-dnk2s-5nl8j True True muvaftesting-dnk2s-5nl8j 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
cluster.eks.aws.upbound.io/muvaftesting-dnk2s-sbqnn True True muvaftesting-dnk2s-sbqnn 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
clusterauth.eks.aws.upbound.io/muvaftesting-dnk2s-ll7v5 True True muvaftesting-dnk2s-ll7v5 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
rolepolicyattachment.iam.aws.upbound.io/muvaftesting-dnk2s-lr7dj True True muvaftesting-dnk2s-rtscd-20220923160713500000000001 2d2h
rolepolicyattachment.iam.aws.upbound.io/muvaftesting-dnk2s-nc4sd True True muvaftesting-dnk2s-s62w7-20220923160729472500000001 2d2h
rolepolicyattachment.iam.aws.upbound.io/muvaftesting-dnk2s-px64p True True muvaftesting-dnk2s-rtscd-20220923160709738400000001 2d2h
rolepolicyattachment.iam.aws.upbound.io/muvaftesting-dnk2s-wkqj4 True True muvaftesting-dnk2s-rtscd-20220923160705931100000001 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
openidconnectprovider.iam.aws.upbound.io/muvaftesting-dnk2s-bgvdd True True arn:aws:iam::609897127049:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/8F74655DA1345523CCA9CDDB7429B424 2d1h
NAME READY SYNCED EXTERNAL-NAME AGE
role.iam.aws.upbound.io/muvaftesting-dnk2s-rtscd True True muvaftesting-dnk2s-rtscd 2d2h
role.iam.aws.upbound.io/muvaftesting-dnk2s-s62w7 True True muvaftesting-dnk2s-s62w7 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
subnetgroup.rds.aws.upbound.io/muvaftesting-h7hs6-7mvls True True muvaftesting-h7hs6-7mvls 2d2h
NAME READY SYNCED EXTERNAL-NAME AGE
instance.rds.aws.upbound.io/muvaftesting-h7hs6-crtb4 True True muvaftesting-h7hs6-crtb4 2d2h |
I couldn't get them to a non-ready state this time. I think we should go ahead with this change because there is no expected side-effects of writing id to state file if there is no ongoing operation. cc @ulucinar @turkenh for review. > kg managed
NAME READY SYNCED EXTERNAL-NAME AGE
routetable.ec2.aws.upbound.io/muvaf1-7bvc7-tls2m True True rtb-0162907fd947a3d21 13h
NAME READY SYNCED EXTERNAL-NAME AGE
mainroutetableassociation.ec2.aws.upbound.io/muvaf1-7bvc7-4pm96 True True rtbassoc-0875c4cb90347f1ac 13h
NAME READY SYNCED EXTERNAL-NAME AGE
vpc.ec2.aws.upbound.io/muvaf1-7bvc7-jfntt True True vpc-0addd55ae51b4cab3 13h
NAME READY SYNCED EXTERNAL-NAME AGE
routetableassociation.ec2.aws.upbound.io/muvaf1-7bvc7-6zt7p True True rtbassoc-084939d4512cc9f37 13h
routetableassociation.ec2.aws.upbound.io/muvaf1-7bvc7-k9xkp True True rtbassoc-088339ad9791c2458 13h
routetableassociation.ec2.aws.upbound.io/muvaf1-7bvc7-ld6z5 True True rtbassoc-05065b895a2f6e5c5 13h
routetableassociation.ec2.aws.upbound.io/muvaf1-7bvc7-wdfhh True True rtbassoc-0980e08ad14c3f228 13h
NAME READY SYNCED EXTERNAL-NAME AGE
securitygroup.ec2.aws.upbound.io/muvaf1-7bvc7-cq79h True True sg-0a4fefdebbeb2e271 13h
NAME READY SYNCED EXTERNAL-NAME AGE
subnet.ec2.aws.upbound.io/muvaf1-7bvc7-2x6zq True True subnet-014fbf9ec4020dc62 13h
subnet.ec2.aws.upbound.io/muvaf1-7bvc7-dqjck True True subnet-0c7b10199f4071df5 13h
subnet.ec2.aws.upbound.io/muvaf1-7bvc7-rn789 True True subnet-0dc3f2f3afc6b8e0f 13h
subnet.ec2.aws.upbound.io/muvaf1-7bvc7-srkc7 True True subnet-0a16abbe5fd793d5e 13h
NAME READY SYNCED EXTERNAL-NAME AGE
securitygrouprule.ec2.aws.upbound.io/muvaf1-7bvc7-7vdnl True True sgrule-474123451 13h
NAME READY SYNCED EXTERNAL-NAME AGE
route.ec2.aws.upbound.io/muvaf1-7bvc7-brp44 True True r-rtb-0162907fd947a3d211080289494 13h
NAME READY SYNCED EXTERNAL-NAME AGE
internetgateway.ec2.aws.upbound.io/muvaf1-7bvc7-fqkp7 True True igw-0424f17d285ed388f 13h
NAME READY SYNCED EXTERNAL-NAME AGE
clusterauth.eks.aws.upbound.io/muvaf1-7bvc7-qdzvt True True muvaf1-7bvc7-qdzvt 13h
NAME READY SYNCED EXTERNAL-NAME AGE
nodegroup.eks.aws.upbound.io/muvaf1-7bvc7-pgx4z True True muvaf1-7bvc7-pgx4z 13h
NAME READY SYNCED EXTERNAL-NAME AGE
cluster.eks.aws.upbound.io/muvaf1-7bvc7-rrggm True True muvaf1-7bvc7-rrggm 13h
NAME READY SYNCED EXTERNAL-NAME AGE
role.iam.aws.upbound.io/muvaf1-7bvc7-rksg2 True True muvaf1-7bvc7-rksg2 13h
role.iam.aws.upbound.io/muvaf1-7bvc7-xpq4h True True muvaf1-7bvc7-xpq4h 13h
NAME READY SYNCED EXTERNAL-NAME AGE
rolepolicyattachment.iam.aws.upbound.io/muvaf1-7bvc7-74zc8 True True muvaf1-7bvc7-xpq4h-20220925191031095500000001 13h
rolepolicyattachment.iam.aws.upbound.io/muvaf1-7bvc7-fhqt6 True True muvaf1-7bvc7-rksg2-20220925191103141700000001 13h
rolepolicyattachment.iam.aws.upbound.io/muvaf1-7bvc7-kqrdb True True muvaf1-7bvc7-xpq4h-20220925191034475300000001 13h
rolepolicyattachment.iam.aws.upbound.io/muvaf1-7bvc7-ttk8j True True muvaf1-7bvc7-xpq4h-20220925191059557400000001 13h
NAME READY SYNCED EXTERNAL-NAME AGE
subnetgroup.rds.aws.upbound.io/muvaf1-rkw7d-hv2xc True True muvaf1-rkw7d-hv2xc 13h
NAME READY SYNCED EXTERNAL-NAME AGE
instance.rds.aws.upbound.io/muvaf1-rkw7d-dvfmx True True muvaf1-rkw7d-dvfmx 13h |
I caught an instance where the state attributes got emptied and it happened on the
Unfortunately, I couldn't get a good idea of why the state attributes end up being empty. It's either the earlier |
Description of your changes
When a resource is not found,
terraform plan -refresh-only
call resets thetfstate
file to not have any element underresources
array like the following:Then the controller triggers
Create
call, which triggersterraform apply
. If at that point, theterraform apply
is shut down before completion, then the tfstate file remains empty without anyid
. This causes the next reconcile think that we don't need to write reproduce tfstate file because the file is there and refresh thinks that there is no resource, it must be new creation. If the earlierterraform apply
call did make it to have the creation completed in the cloud side (because most of whatterraform apply
does is to wait after making the create API call), then we try to create again and get the infamousalready exists
errors.This PR expands the tfstate file check in a way that
EnsureTFState
function guarantees that there is at leastid
after the preparation of the workspace folder.Fixes https://github.com/upbound/official-providers/issues/766
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.How has this code been tested
First, without this change, I was able to reproduce the
already exists
error reliably by creating aInstance
ofrds
group which takes ~5mins to complete. During its creation, I killed the provider process, which makes it kill the Terraform process. When I start it back, I get thealready exists
error.Now, with the change, I bring the provider back up and because it can't find the
id
in thetfstate
, it reproduces the state again as if we import an existing resource. No error is thrown and we seestatus.atProvider.status: creating
. The difference would be that we'd normally not set theReady
condition until the async creation is completed but in this case, the only operation done is todescribe
the existing resource and that succeeds, hence you'd see that resource is marked as ready if that process kill happens during the creation. I'm not really worried about this since it happens sporadically and after a while and even if it did happen during creation, the resources referencing it would error out but become eventually stable.Next up, I'll use this change in a DB that's up for longer than 24 hours to see if https://github.com/upbound/official-providers/issues/766 is reproduced.