Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to save state: invalid character 'R' looking for beginning of value #24394

Closed
aneroid13 opened this issue Mar 17, 2020 · 5 comments
Closed
Labels
backend/consul bug v0.12 Issues (primarily bugs) reported against v0.12 releases

Comments

@aneroid13
Copy link

Terraform Version

Terraform v0.12.20

Terraform Configuration Files

### - Consul backend configuration - ###
  backend "consul" {
    address = "localhost:8500"
    scheme  = "http"
    path    = "dev/terraform/testzones/zone.tfstate"
    datacenter = "terraform"
  }
}

Debug Output

...
module.sso-ac.vsphere_virtual_machine.VMClone[0]: Creation complete after 16m17s [id=422ecd37-3dcb-b215-fbd7-7b7133a3136b]
 2020/03/10 17:29:49 [TRACE] module.sso-ac: eval: *terraform.EvalUpdateStateHook
 2020/03/10 17:29:49 [TRACE] [walkApply] Exiting eval tree: module.sso-ac.vsphere_virtual_machine.VMClone[0]
 2020/03/10 17:29:49 [TRACE] vertex "module.sso-ac.vsphere_virtual_machine.VMClone[0]": visit complete
 2020/03/10 17:29:49 [TRACE] dag/walk: visiting "provisioner.remote-exec (close)"
 2020/03/10 17:29:49 [TRACE] vertex "provisioner.remote-exec (close)": starting visit (*terraform.graphNodeCloseProvisioner)
 2020/03/10 17:29:49 [TRACE] vertex "provisioner.remote-exec (close)": evaluating
 2020/03/10 17:29:49 [TRACE] [walkApply] Entering eval tree: provisioner.remote-exec (close)
 2020/03/10 17:29:49 [TRACE] <root>: eval: *terraform.EvalCloseProvisioner
 2020/03/10 17:29:49 [TRACE] dag/walk: visiting "provisioner.local-exec (close)"
 2020/03/10 17:29:49 [TRACE] vertex "provisioner.local-exec (close)": starting visit (*terraform.graphNodeCloseProvisioner)
 2020/03/10 17:29:49 [TRACE] vertex "provisioner.local-exec (close)": evaluating
 2020/03/10 17:29:49 [TRACE] [walkApply] Entering eval tree: provisioner.local-exec (close)
 2020/03/10 17:29:49 [TRACE] <root>: eval: *terraform.EvalCloseProvisioner
 2020/03/10 17:29:49 [TRACE] dag/walk: visiting "meta.count-boundary (EachMode fixup)"
 2020/03/10 17:29:49 [TRACE] vertex "meta.count-boundary (EachMode fixup)": starting visit (*terraform.NodeCountBoundary)
 2020/03/10 17:29:49 [TRACE] vertex "meta.count-boundary (EachMode fixup)": evaluating
 2020/03/10 17:29:49 [TRACE] [walkApply] Entering eval tree: meta.count-boundary (EachMode fixup)
 2020/03/10 17:29:49 [TRACE] <root>: eval: *terraform.EvalCountFixZeroOneBoundaryGlobal
 2020/03/10 17:29:49 [TRACE] [walkApply] Exiting eval tree: meta.count-boundary (EachMode fixup)
 2020/03/10 17:29:49 [TRACE] vertex "meta.count-boundary (EachMode fixup)": visit complete
 2020/03/10 17:29:49 [TRACE] dag/walk: visiting "provider.vsphere (close)"
 2020/03/10 17:29:49 [TRACE] vertex "provider.vsphere (close)": starting visit (*terraform.graphNodeCloseProvider)
 2020/03/10 17:29:49 [TRACE] vertex "provider.vsphere (close)": evaluating
 2020/03/10 17:29:49 [TRACE] [walkApply] Entering eval tree: provider.vsphere (close)
 2020/03/10 17:29:49 [TRACE] <root>: eval: *terraform.EvalCloseProvider
 2020/03/10 17:29:49 [TRACE] GRPCProvider: Close
 2020-03-10T17:29:49.360+0300 [DEBUG] plugin: plugin process exited: path=/data/TF/plugins/terraform-provider-vsphere_v1.14.0_x4 pid=6825
 2020-03-10T17:29:49.360+0300 [DEBUG] plugin: plugin exited
 2020/03/10 17:29:49 [TRACE] [walkApply] Exiting eval tree: provider.vsphere (close)
 2020/03/10 17:29:49 [TRACE] vertex "provider.vsphere (close)": visit complete
 2020-03-10T17:29:49.361+0300 [DEBUG] plugin: plugin process exited: path=/usr/local/bin/terraform pid=6784
 2020-03-10T17:29:49.361+0300 [DEBUG] plugin: plugin exited
 2020/03/10 17:29:49 [TRACE] [walkApply] Exiting eval tree: provisioner.remote-exec (close)
 2020/03/10 17:29:49 [TRACE] vertex "provisioner.remote-exec (close)": visit complete
 2020-03-10T17:29:49.364+0300 [DEBUG] plugin: plugin process exited: path=/usr/local/bin/terraform pid=6807
 2020-03-10T17:29:49.364+0300 [DEBUG] plugin: plugin exited
 2020/03/10 17:29:49 [TRACE] [walkApply] Exiting eval tree: provisioner.local-exec (close)
 2020/03/10 17:29:49 [TRACE] vertex "provisioner.local-exec (close)": visit complete
 2020/03/10 17:29:49 [TRACE] dag/walk: visiting "root"
 2020/03/10 17:29:49 [TRACE] vertex "root": starting visit (terraform.graphNodeRoot)
 2020/03/10 17:29:49 [TRACE] vertex "root": visit complete
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: reading initial snapshot from errored.tfstate
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: read nil snapshot
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: Importing snapshot with lineage "325edc88-28f9-7559-da48-92cebcfa70d7" serial 109 as the initial state snapshot at errored.tfstate
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: preparing to manage state snapshots at errored.tfstate
 Failed to save state: invalid character 'R' looking for beginning of value
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: no previously-stored snapshot exists
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: state file backups are disabled
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: forcing lineage "325edc88-28f9-7559-da48-92cebcfa70d7" serial 109 for migration/import
 2020/03/10 17:29:49 [TRACE] statemgr.Filesystem: writing snapshot at errored.tfstate
 Error: Failed to persist state to backend.
 The error shown above has prevented Terraform from writing the updated state
 to the configured backend. To allow for recovery, the state has been written
 to the file "errored.tfstate" in the current working directory.
 Running "terraform apply" again at this point will create a forked state,
 making it harder to recover.
 To retry writing this state, use the following command:
     terraform state push errored.tfstate
 ERROR: Job failed: exit status 1

Crash Output

Failed to save state: invalid character 'R' looking for beginning of value

Error: Failed to persist state to backend.
 The error shown above has prevented Terraform from writing the updated state
 to the configured backend. To allow for recovery, the state has been written
 to the file "errored.tfstate" in the current working directory.
 Running "terraform apply" again at this point will create a forked state,
 making it harder to recover.
 To retry writing this state, use the following command:
     terraform state push errored.tfstate
 ERROR: Job failed: exit status 1

Expected Behavior

Terraform should deploy all virtual machines and save configuration to Consul.

Actual Behavior

Terraform create all vms, but failed to save updated state to Consul.
It create file errored.tfstate with current config instead and drop the error.

Steps to Reproduce

terraform init -get-plugins=false -plugin-dir /data/TF/plugins
terraform workspace select Testzone
terraform plan -lock-timeout=15s -input=false -out=plan.tfplan
terraform apply -lock-timeout=15s -input=false -auto-approve plan.tfplan

Additional Context

Issue happened not on every vm creation. It is not happened in test environment, while I create 2 or 3 virtual machines.
This happened when I create more than 60 VMs in workspace.

I destroy all VMs in the workspace and clear terraform configuration in Consul, when terraform fail.
And start again, create a lot of VMs and terraform failed to save .tfstate, but it stuck on different VM, than first time.

Make "terraform state push errored.tfstate" returns the same error: "Failed to save state: invalid character 'R' looking for beginning of value"

Reason of this issue not clear. I can't find any suspicious 'R' character inside errored.tfstate or anywhere else.

References

@alisdair alisdair added backend/consul bug v0.12 Issues (primarily bugs) reported against v0.12 releases labels Mar 17, 2020
@jbardin
Copy link
Member

jbardin commented Mar 17, 2020

Hi @aneroid13,

Could you try and check the value of the consul key directly? Since terraform state is always expected to be json, I wonder if something else was somehow saved to that key, and terraform is attempting to decode it each time.

If that doesn't provide any clues, would it be possible to post the state file you have? Be sure to check it first for sensitive values if you do.

Thanks!

@jbardin jbardin added the waiting-response An issue/pull request is waiting for a response from the community label Mar 17, 2020
@aneroid13
Copy link
Author

aneroid13 commented Apr 22, 2020

Hello.

First of all, I create new workspace for clean experiment. And error repeat again.

I try investigate Consul, but can't find anything suspicious in logs. This strings Consul write when error happens:

2020/04/21 13:53:40 [DEBUG] memberlist: Initiating push/pull sync with: 10.104.122.4:8301
2020/04/21 13:53:44 [DEBUG] http: Request PUT /v1/txn?dc=terraform (426.229µs) from=[::1]:23882
2020/04/21 13:53:44 [DEBUG] http: Request DELETE /v1/kv/dev/terraform/testzones/zone.tfstate-env:Gamma/.lockinfo?dc=terraform (24.942671ms) from=[::1]:24706
2020/04/21 13:53:44 [DEBUG] http: Request PUT /v1/kv/dev/terraform/testzones/zone.tfstate-env:Gamma/.lock?dc=terraform&flags=3304740253564472344&release=038739fe-d1b4-20a1-c905-d77eeee58ba1 (22.657546ms) from=[::1]:24706
2020/04/21 13:53:44 [DEBUG] http: Request GET /v1/kv/dev/terraform/testzones/zone.tfstate-env:Gamma/.lock?dc=terraform (18.810056ms) from=[::1]:24706
2020/04/21 13:53:44 [DEBUG] http: Request GET /v1/kv/dev/terraform/testzones/zone.tfstate-env:Gamma/.lock?consistent=&dc=terraform&index=3892783 (1m47.991262755s) from=[::1]:23886
2020/04/21 13:53:44 [DEBUG] http: Request DELETE /v1/kv/dev/terraform/testzones/zone.tfstate-env:Gamma/.lock?cas=3892944&dc=terraform (22.079559ms) from=[::1]:24706
2020/04/21 13:53:44 [DEBUG] http: Request PUT /v1/session/destroy/038739fe-d1b4-20a1-c905-d77eeee58ba1?dc=terraform (25.485567ms) from=[::1]:24706`

But I find out one interesting clue: error triggers when I got more than 35 VMs inside workspace.
I can remove VMs with terraform, create again and back and forth but it only happen, when I create any 36th VM (or more).

Very strange.

Is there some limit on lenght of tfstate file in Consul or something like that ?

wc ./errored.tfstate 
 11330  20928 397220 ./errored.tfstate

@ghost ghost removed the waiting-response An issue/pull request is waiting for a response from the community label Apr 22, 2020
@al-lac
Copy link

al-lac commented May 14, 2020

Hi @aneroid13

We had the same problem once we got to a state which had about 10k lines. We solved it by adding the gzip option to our backend consul settings:

### - Consul backend configuration - ###
  backend "consul" {
    address = "localhost:8500"
    scheme  = "http"
    path    = "dev/terraform/testzones/zone.tfstate"
    datacenter = "terraform"
    gzip = true
  }
}

@aneroid13
Copy link
Author

Thank you @al-lac. I just thought about it.
And now I test it, yes it works.

But that error message just drive me mad, until I find the reason.
And this is a VERY old issue: hashicorp/consul#3358
"Values in the KV store cannot be larger than 512kb." by Hashicorp
https://www.consul.io/api/kv.html

Unfortunately, gzip is not solution, because I need a lot VM in my test zone, more over 300. :(

@ghost
Copy link

ghost commented Jun 27, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jun 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend/consul bug v0.12 Issues (primarily bugs) reported against v0.12 releases
Projects
None yet
Development

No branches or pull requests

4 participants