-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job modify index has changed since last refresh #398
Comments
Hi @shantanugadgil 👋 This error happens when the job is modified between a Terraform state refresh and the provider plans the job submission. terraform-provider-nomad/nomad/resource_job.go Lines 745 to 749 in 8f20175
Unfortunately I have not been able to reproduce it since it should only happen in a race condition. Do you, by any chance, have multiple Terraform runs modifying the same job at the same time? Another unfortunate aspect of this is that Terraform SDK doesn't provide a way for use to get around this. State refresh is handled completely outside of the provider and by the time it is invoked, we only receive the state data. Retrying inside the provider would have no effect since the state data will always be the same. I don't have a lot of experience with Atlantis, but is there a way to automatically retry failed runs? Or maybe reduce how many plan/applies are executed in parallel? The only way forward that I see to fix this would require changes outside of the provider. Maybe we could make this check optional? But then it would result in a Terraform plan is almost guaranteed to result in data loss since the job was changed while the diff was computed. |
There is nothing changing the job index outside of the Terraform way. I could elaborate on the contents of the state... it is an AWS ASG with some Nomad jobs launching on it. Job type is
Due to this, we use the Atlantis method only to check that we have no "compile errors" (basic syntax issues) and the since the bug report above, I have gone ahead and added the apply:
steps:
- apply:
extra_args: ["-refresh=true"]
As I am typing this, I realized another thing ... this job (and others like it) which randomly fail are in a different namespace than (hunch) could that be related somehow? |
I've recently merged this change, which forces a namespace on the The wrong namespace could be an issue if:
But I suspect you would have a lot more problems than the modify index being different 🤔 Next time it happens, could you collect the value of |
OK, will try to capture these the next time. |
Hey @lgfa29, I am one of the developers of the pulumi-nomad provider, which uses the TF nomad provider under the hood. Pulumi, unlike terraform does not run refresh by default and this issue affects users more than users of the TF provider. This should be very reproducible in TF with Is it possible to supply a flag here to disable the |
closing due to age. Lot of Nomad versions updated since the original bug report and a couple of version updates to the provider as well. |
@shantanugadgil this is still an issue can we please reopen it? Is it possible to also consider the suggestion in #398 (comment) of adding a flag to disable the |
I am just doing some cleaning up of issues I have reported, but which have been open for a long time! 🙂 I didn't think someone else was tracking this as well! 👍 I'll re-open this issue. |
Should I open a new issue for this? What's the best way to bubble this problem up? |
This issue is a real pain for us on our development cluster. Mostly because we use |
Terraform Version
Nomad Version
3 node server cluster at version 1.6.3
Provider Configuration
Which values are you setting in the provider configuration?
Environment Variables
Do you have any Nomad specific environment variable set in the machine running Terraform?
Not Nomad specific, but we have
TF_IN_AUTOMATION = "1"
set as we run this under Atlantis.Affected Resource(s)
nomad_job
We have a "common" job which runs using the nomad provider like so ...
Terraform Configuration Files
We have Atlantis setup for automation.
The problem is that the
atlantis plan
works fine, but fails during apply.Debug Output
N/A
Panic Output
N/A
Expected Behavior
apply
should have worked properlyActual Behavior
Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
terraform apply
Important Factoids
this behavior is only seen when run under Atlantis.
If I merge the code and run it on the command line, this doesn't occur.
References
N/A
Q: Is there something we could do (like a force refresh before apply) to make this error go away?
The text was updated successfully, but these errors were encountered: