`Cannot delete` error occurs when `aws_batch_compute_environment` used in `aws_batch_job_queue` is recreated #2044

mia-0032 · 2017-10-25T07:35:46Z

Hi there,

I found that Cannot delete, found existing JobQueue relationship error occurs when aws_batch_compute_environment used in aws_batch_job_queue is recreated.

Do you have any solutions in regards to this?

Terraform Version

Terraform v0.10.7
Terraform-provider-aws v1.1.0

Affected Resource(s)

aws_batch_compute_environment
aws_batch_job_queue

Terraform Configuration Files

resource "aws_batch_compute_environment" "test" {
  compute_environment_name = "test_batch"
  type                     = "MANAGED"
  service_role             = "arn:aws:iam::xxxxxxxx:role/xxxxxxxx"

  compute_resources {
    type          = "EC2"
    instance_role = "arn:aws:iam::xxxxxxxx:instance-profile/xxxxxxxx"
    instance_type = ["c4.large"]
    max_vcpus     = 8
    desired_vcpus = 0
    min_vcpus     = 0

    security_group_ids = [
      "sg-xxxxxxxx"
    ]

    subnets = [
      "subnet-xxxxxxxx", "subnet-xxxxxxxx"
    ]
  }
}

resource "aws_batch_job_queue" "test" {
  name = "test-batch-job-queue"
  state = "ENABLED"
  priority = 3
  compute_environments = ["${aws_batch_compute_environment.test.arn}"]
}

Output

plan:

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_batch_compute_environment.test: Refreshing state... (ID: test_batch)
aws_batch_job_queue.test: Refreshing state... (ID: arn:aws:batch:ap-northeast-1:xxxxxxxxxxxxx:job-queue/test-batch-job-queue)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

-/+ aws_batch_compute_environment.test (new resource required)
      id:                                                "test_batch" => <computed> (forces new resource)
      arn:                                               "arn:aws:batch:ap-northeast-1:xxxxxxxxxxxxx:compute-environment/test_batch" => <computed>
      compute_environment_name:                          "test_batch" => "test_batch"
      compute_resources.#:                               "1" => "1"
      compute_resources.0.desired_vcpus:                 "0" => "0"
      compute_resources.0.instance_role:                 "arn:aws:iam::xxxxxxxxxxxxx:instance-profile/xxxxxxxxxxxxx" => "arn:aws:iam::xxxxxxxxxxxxx:instance-profile/xxxxxxxxxxxxx"
      compute_resources.0.instance_type.#:               "1" => "1"
      compute_resources.0.instance_type.3819562017:      "c4.large" => "c4.large"
      compute_resources.0.max_vcpus:                     "8" => "8"
      compute_resources.0.min_vcpus:                     "0" => "0"
      compute_resources.0.security_group_ids.#:          "1" => "2" (forces new resource)
      compute_resources.0.security_group_ids.1377324769: "sg-xxxxxxxx" => "sg-xxxxxxxx"
      compute_resources.0.security_group_ids.3516056991: "" => "sg-yyyyyyyy" (forces new resource)
      compute_resources.0.subnets.#:                     "2" => "2"
      compute_resources.0.subnets.796390534:             "subnet-xxxxxxxxx" => "subnet-xxxxxxxxx"
      compute_resources.0.subnets.877356347:             "subnet-xxxxxxxxx" => "subnet-xxxxxxxxx"
      compute_resources.0.type:                          "EC2" => "EC2"
      ecc_cluster_arn:                                   "arn:aws:ecs:ap-northeast-1:xxxxxxxxxxxxx:cluster/test_batch_Batch_e2eb0db4-1f83-3935-94af-e38f529d6480" => <computed>
      ecs_cluster_arn:                                   "arn:aws:ecs:ap-northeast-1:xxxxxxxxxxxxx:cluster/test_batch_Batch_e2eb0db4-1f83-3935-94af-e38f529d6480" => <computed>
      service_role:                                      "arn:aws:iam::xxxxxxxxxxxxx:role/xxxxxxxxx" => "arn:aws:iam::xxxxxxxxxxxxx:role/xxxxxxxxx"
      state:                                             "DISABLED" => "ENABLED"
      status:                                            "VALID" => <computed>
      status_reason:                                     "ComputeEnvironment Healthy" => <computed>
      type:                                              "MANAGED" => "MANAGED"

  ~ aws_batch_job_queue.test
      compute_environments.#:                            "1" => <computed>


Plan: 1 to add, 1 to change, 1 to destroy.

apply:

aws_batch_compute_environment.test: Refreshing state... (ID: test_batch)
aws_batch_job_queue.test: Refreshing state... (ID: arn:aws:batch:ap-northeast-1:xxxx:job-queue/test-batch-job-queue)
aws_batch_compute_environment.test: Destroying... (ID: test_batch)
Error applying plan:

1 error(s) occurred:

* aws_batch_compute_environment.test (destroy): 1 error(s) occurred:

* aws_batch_compute_environment.test: : Cannot delete, found existing JobQueue relationship
        status code: 400, request id: xxx

Panic Output

None

Expected Behavior

delete aws_batch_job_queue
delete and create aws_batch_compute_environment
create aws_batch_job_queue

Actual Behavior

recreate compute environment
update job queue

Steps to Reproduce

terraform apply
Add a security group id to aws_batch_compute_environment.test.
terraform apply
The error occurs.

Important Factoids

No

References

I could not find any issues related this.

The text was updated successfully, but these errors were encountered:

andylockran · 2017-11-10T15:30:13Z

I have this same issue; the problem appears to be the disable, then delete step for both job queue and batch environment.

It currently takes about 60 seconds to disable, then delete a job queue. In the current state a job_queue is in the DELETING state when the batch_compute_environment is sent it's kill signal. This leaves it in a state where it's successfully disabled, but hasn't received the delete command.

Workaround

Clicking 'delete' on the batch_compute_environment and then manually removing it from the terraform state is the only workaround I have at the moment.

andylockran · 2017-11-13T08:34:18Z

@shibataka000 are you still actively working on this module? I'd like to help work out how to get this issue resolved if you are.
Thanks,

Andy

shibataka000 · 2017-11-16T15:28:52Z

@andylockran It's bug reported at #1710 (comment) . I create #2322 to fix it.

@mia-0032 Another bug caused it. I will create PR after #2322 merged.

shibataka000 · 2017-11-17T14:09:16Z

Sorry, my description and branch name was not good. #2044 has not been fixed yet.
Issue #2044 has two bugs. One is #2044 (comment) and fixed by #2322. Another one is #2044 (comment) and it will be fixed by #2347.

maulik887 · 2017-12-13T10:19:17Z

Is this fix available in latest aws provider release 1.5.0? I'm still facing this issue.. I'm using Terraform v0.10.5 and aws provider 1.5.0

shibataka000 · 2017-12-16T02:25:35Z

@maulik887 You can update compute environment with job queue by #2347 (comment) :-)

endemics · 2018-04-10T13:35:17Z

@shibataka000 wrote:

You can update compute environment with job queue by #2347 (comment) :-)

IIUC this workaround only work once, as the random resource is created only once.

The name then become fixed and further changes to the compute_resources that would require resource recreation (i-e all the parameters marked "replacement" in https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-computeenvironment-computeresources.html) would return the error "Object already exists" as per #3207.

You would then need to force an update of the random name (by marking it as taintedor by changing its value).

If my understanding is correct then this issue should still be open.

However at this stage it feels to me that the prefix solution from #3207 (which was also described by @radeksimko in #2347 (comment)) would fix this and other related issues in a clean way (especially given that behind the scene it all ends up with LC/ASGs anyways), so maybe it should be marked as duplicate?

stevenao · 2018-04-21T06:17:24Z

I can confirm the delete is still broken in AWS 1.5.0.

ganeshk-ai · 2018-06-19T19:02:09Z

I get this issue when i try to update the compute environment.
Plan is successful and show that it would recreate the compute environment and updates the queue resource in-place. But apply throws the above error.
My workaround is to temporarily delete the queue resource from the code and apply. This forces queue delete before recreating the compute environment. After a successful apply, i add the queue resource back and apply.

danielcompton · 2018-08-06T01:32:50Z

This issue should be reopened, it is still broken in AWS 1.22.0.

brainstorm · 2018-08-13T04:15:23Z

Same here:

* aws_batch_compute_environment.batch: error deleting Batch Compute Environment (my_batch_ce): : Cannot delete, found existing JobQueue relationship
	status code: 400, request id: 255bf935-9eaf-11e8-a6a0-99999999

While running:

$ terraform --version
Terraform v0.11.7
+ provider.aws v1.24.0

brainstorm · 2018-08-13T04:26:24Z

Also failing while running:

$ terraform --version
Terraform v0.11.7
+ provider.aws v1.31.0

brainstorm · 2018-08-13T04:53:47Z

Solved by tainting the job queue, took around 1,5 minutes for it to be destroyed though and almost 20min for the rest:

$ terraform taint aws_batch_job_queue.my_batch_queue && terraform apply
(...)
* module.compute_env.aws_batch_compute_environment.batch (destroy): 1 error(s) occurred:

* aws_batch_compute_environment.batch: error deleting Batch Compute Environment (my_batch): timeout while waiting for state to become 'DELETED' (last state: 'DELETING', timeout: 20m0s)

Although I guess it's a AWS Batch backend/API issue, not really about terraform?

danielcompton · 2018-08-13T07:39:41Z

I would argue that it is a Terraform provider issue. Making entirely reasonable changes to Terraform config can leave your Terraform resources in an inconsistent state which requires manual intervention to fix. That seems like something that the provider should be dealing with.

brainstorm · 2018-08-16T02:23:00Z

Yep, something seems to be off, I just got this other error message trying to destroy the CE today:

* aws_batch_compute_environment.batch: error disabling Batch Compute Environment (umccrise_compute_env_dev): unexpected state 'INVALID', wanted target 'VALID'. last error: %!s(<nil>)

davidvasandani · 2018-08-23T18:18:23Z

@radeksimko Do you mind re-opening this issue?

skeller88 · 2018-09-21T21:39:39Z

A fix for me in the meantime:

Manually delete the JobQueue
use terraform state rm to remove the job queue from terraform's state
terraform apply

lightjacket · 2018-09-23T18:03:00Z

Just ran into this myself. For me, I initially added the lifecycle rule create_before_destroy so the queue relationships would get moved before the compute environment was destroyed. That works if I change the name for the compute environment. So it seems to me that adding something like compute_environment_name_prefix (similar to the name_prefix on a launch config) would be an easy way to resolve most of the issues here?

If someone could confirm I'd be happy to take a stab at a pull request for that.

endemics · 2018-09-24T04:47:34Z

(...) So it seems to me that adding something like compute_environment_name_prefix (similar to the name_prefix on a launch config) would be an easy way to resolve most of the issues here?

If someone could confirm I'd be happy to take a stab at a pull request for that.

Yes please it would indeed solve heaps of issues! However, could you create the MR against #3207 rather, as this issue is officially closed?

Ludonope · 2019-01-28T07:13:48Z

I have this problem too, I'm not quite sure why this issue is closed, since it clearly is a problem and it's still there.

It should definitely delete the Job Queue before deleting the Compute Environment, and the recreate them, or just modify the Compute Environment directly when possible.

Having to intervene manually is a real problem, and at the same time it proves that it can work like that.

It there a PR for that issue right now?

kordian-kowalski · 2019-02-28T17:34:42Z

Like @Ludonope, I just ran into this exact same issue.

Terraform v0.11.11
+ provider.aws v1.60.0

monkut · 2019-03-20T01:35:14Z

Had the same issue and followed this comment to recover:
#2044 (comment)

gdippolito · 2019-04-01T13:10:32Z

Hi @radeksimko

Would it be possible to re-open this bus?

I have faced this problem using the latest version of terraform and AWS provider:

Terraform v0.11.13
+ provider.aws v2.4.0

#2044 is still a valid fix but it is not ideal to perform this task everytime the compute environment is changed.

atomkirk · 2019-10-31T13:14:51Z

The workaround I use is to just add create_before_destroy and then change the name by one character every time I need it to be replaced.

resource "aws_batch_compute_environment" "some_name" {
  compute_environment_name = "some-name"

  …

  lifecycle {
    create_before_destroy = true
  }
}

So if I change some attribute of the compute environment and run apply, it fails because that name already exists, so if I just do some--name and run apply, it works. Then next time I change back to some-name and run apply it works again. On and on until this is fixed. (this also recreates the queues and job definitions, which isn't a problem in my case 🤷‍♂ )

ghost · 2019-11-01T14:50:28Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

shibataka000 mentioned this issue Nov 17, 2017

Fix bug that not waiting batch_job_queue deleted. #2322

Merged

radeksimko closed this as completed in #2322 Nov 17, 2017

shibataka000 added a commit to shibataka000/terraform-provider-aws that referenced this issue Nov 17, 2017

Fix hashicorp#2044

4ebf59e

shibataka000 mentioned this issue Nov 17, 2017

Fix bug that batch_compute_environment with batch_job_queue reconstruction failed #2347

Closed

ghost locked and limited conversation to collaborators Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Cannot delete` error occurs when `aws_batch_compute_environment` used in `aws_batch_job_queue` is recreated #2044

`Cannot delete` error occurs when `aws_batch_compute_environment` used in `aws_batch_job_queue` is recreated #2044

mia-0032 commented Oct 25, 2017

andylockran commented Nov 10, 2017

andylockran commented Nov 13, 2017

shibataka000 commented Nov 16, 2017

shibataka000 commented Nov 17, 2017

maulik887 commented Dec 13, 2017

shibataka000 commented Dec 16, 2017

endemics commented Apr 10, 2018

stevenao commented Apr 21, 2018

ganeshk-ai commented Jun 19, 2018 •

edited

Loading

danielcompton commented Aug 6, 2018 •

edited

Loading

brainstorm commented Aug 13, 2018 •

edited

Loading

brainstorm commented Aug 13, 2018

brainstorm commented Aug 13, 2018 •

edited

Loading

danielcompton commented Aug 13, 2018 •

edited

Loading

brainstorm commented Aug 16, 2018

davidvasandani commented Aug 23, 2018

skeller88 commented Sep 21, 2018

lightjacket commented Sep 23, 2018

endemics commented Sep 24, 2018

Ludonope commented Jan 28, 2019

kordian-kowalski commented Feb 28, 2019 •

edited

Loading

monkut commented Mar 20, 2019

gdippolito commented Apr 1, 2019

atomkirk commented Oct 31, 2019 •

edited

Loading

ghost commented Nov 1, 2019

Cannot delete error occurs when aws_batch_compute_environment used in aws_batch_job_queue is recreated #2044

Cannot delete error occurs when aws_batch_compute_environment used in aws_batch_job_queue is recreated #2044

Comments

mia-0032 commented Oct 25, 2017

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

andylockran commented Nov 10, 2017

Workaround

andylockran commented Nov 13, 2017

shibataka000 commented Nov 16, 2017

shibataka000 commented Nov 17, 2017

maulik887 commented Dec 13, 2017

shibataka000 commented Dec 16, 2017

endemics commented Apr 10, 2018

stevenao commented Apr 21, 2018

ganeshk-ai commented Jun 19, 2018 • edited Loading

danielcompton commented Aug 6, 2018 • edited Loading

brainstorm commented Aug 13, 2018 • edited Loading

brainstorm commented Aug 13, 2018

brainstorm commented Aug 13, 2018 • edited Loading

danielcompton commented Aug 13, 2018 • edited Loading

brainstorm commented Aug 16, 2018

davidvasandani commented Aug 23, 2018

skeller88 commented Sep 21, 2018

lightjacket commented Sep 23, 2018

endemics commented Sep 24, 2018

Ludonope commented Jan 28, 2019

kordian-kowalski commented Feb 28, 2019 • edited Loading

monkut commented Mar 20, 2019

gdippolito commented Apr 1, 2019

atomkirk commented Oct 31, 2019 • edited Loading

ghost commented Nov 1, 2019

`Cannot delete` error occurs when `aws_batch_compute_environment` used in `aws_batch_job_queue` is recreated #2044

`Cannot delete` error occurs when `aws_batch_compute_environment` used in `aws_batch_job_queue` is recreated #2044

ganeshk-ai commented Jun 19, 2018 •

edited

Loading

danielcompton commented Aug 6, 2018 •

edited

Loading

brainstorm commented Aug 13, 2018 •

edited

Loading

brainstorm commented Aug 13, 2018 •

edited

Loading

danielcompton commented Aug 13, 2018 •

edited

Loading

kordian-kowalski commented Feb 28, 2019 •

edited

Loading

atomkirk commented Oct 31, 2019 •

edited

Loading