[Redis] Internal Network API Bug #1347

tombuildsstuff · 2017-06-26T10:05:49Z

👋 howdy!

I've been trying to add support for Redis on the Internal Subnets to Terraform.

Creating a Redis instance on an Internal Subnet works great, however there's a bug in the API when deleting it:

DELETE /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/acctestRG-21/providers/Microsoft.Cache/Redis/acctestRedis21?api-version=2016-04-01 HTTP/1.1
Host: management.azure.com
User-Agent: HashiCorp-Terraform-v0.9.8
Accept-Encoding: gzip

which returns:

HTTP/1.1 200 OK
Cache-Control: no-cache
Date: Mon, 26 Jun 2017 09:48:09 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-HTTPAPI/2.0
Strict-Transport-Security: max-age=31536000; includeSubDomains
Content-Length: 0

Querying for the Redis Cache at this point then returns a 404:

GET /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/acctestRG-21/providers/Microsoft.Cache/Redis/acctestRedis21?api-version=2016-04-01 HTTP/1.1
Host: management.azure.com
User-Agent: HashiCorp-Terraform-v0.9.8
Accept-Encoding: gzip

[DEBUG] AzureRM Response for https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/acctestRG-21/providers/Microsoft.Cache/Redis/acctestRedis21?api-version=2016-04-01:
HTTP/1.1 404 Not Found
Content-Length: 152
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Mon, 26 Jun 2017 09:48:10 GMT
Expires: -1
Pragma: no-cache

{"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Cache/Redis/acctestRedis21' under resource group 'acctestRG-21' was not found."}}

Thus, it appears safe to delete the Subnet:

DELETE /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/acctestRG-21/providers/Microsoft.Network/virtualNetworks/acctestnw-21/subnets/testsubnet?api-version=2017-03-01 HTTP/1.1
Host: management.azure.com
User-Agent: HashiCorp-Terraform-v0.9.8
Accept-Encoding: gzip

However attempting to delete the Subnet at this point returns an error:

2017/06/26 10:48:10 [DEBUG] AzureRM Response for https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/acctestRG-21/providers/Microsoft.Network/virtualNetworks/acctestnw-21/subnets/testsubnet?api-version=2017-03-01:
HTTP/1.1 400 Bad Request
Content-Length: 285
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Mon, 26 Jun 2017 09:48:10 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-HTTPAPI/2.0

{
  "error": {
    "code": "InUseSubnetCannotBeDeleted",
    "message": "Subnet testsubnet is in use by /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/acctestRG-21/providers/Microsoft.Cache/Redis/acctestRedis21 and cannot be deleted.",
    "details": []
  }
}

From what I can see this is a bug in the Redis API which is exposing the Redis Cache as deleted when it's really still kicking around. Would it be possible to fix this bug in the API so that the Redis Cache is returned in a Deleting state until it's deleted like the other resources do?

Thanks!

The text was updated successfully, but these errors were encountered:

tombuildsstuff · 2017-06-26T15:14:18Z

Hey @sarangan12

Just an FYI that this looks similar to a previous bug with Virtual Network Gateway's, I'm wondering it's related? #1233

Thanks!

sarangan12 · 2017-07-07T22:35:01Z

This is a service side issue. I am assigning this issue to @TimLovellSmith and @hrishi18pathak to look in to the issue

tombuildsstuff · 2017-07-26T11:42:01Z

👋 hey @TimLovellSmith / @hrishi18pathak

Is there any update on this issue, or a rough timeframe for when this is likely to be fixed?

Thanks!

TimLovellSmith · 2017-07-27T19:32:25Z

@tombuildsstuff
Yes, there's a bug where there is a delay between the redis cache returning Deleted OK and the subnet actually being deletable. A fix may not happen for a while yet, since it is a behavioral change to return 202 instead of 200 for delete and there is a risk of confusing existing clients by making that change (if they are not already coded to handle it), we will likely only fix it in a future api version. A workaround is just to retry delete subnet until it actually succeeds, it should normally succeed within 10-15 minutes.

tombuildsstuff · 2017-09-25T14:22:36Z

@TimLovellSmith thanks for the update - is there a timeframe for when a new API version will be released? Based on your description, I'm assuming that would be duplicating the API version and updating the behaviour, rather than including additional functionality - and as such should be relatively quick?

Unfortunately the workaround specified above isn't ideal since it opens us up to other issues where the deletion failing is valid (such as permissions, Locking or the HyperV networking/locking errors) and we'll have to keep track of which error are retry-able and which aren't - and as such I'd prefer to avoid that if at all possible.

Thanks!

TimLovellSmith · 2017-09-25T18:17:13Z

@tombuildsstuff Sorry, still no definite timeframe on that change.. I'm not familiar with the hyper-v networking/locking errors you mention, what are those like? For permissions, it would fail with 401 or 403 status code, rather than 400. You can most definitely distinguish this particular scenario via the 'InUseSubnetCannotBeDeleted' status in the error response. Which as you say, is a lot better than retrying blindly - although I understand its undesirable to do the work of writing such a special case.

/cc @JonCole

TimLovellSmith · 2018-01-30T19:17:44Z

Suggest closing as 'probably won't fix' - note unfortunately if we change the behavior to be more transparent it is still going to take just as long to delete the subnet, and its not that hard to workaround, so it is just not a very high priority to fix it.

rem-aj · 2018-01-30T19:30:05Z

Hi @TimLovellSmith, so just to be clear we will be able to use terraform to create the azure redis service passing in it's own subnet and static IP, we just won't be able to delete the said subnet with terraform? I just want to confirm that at least this part is functional and if so is that in the azure provider version 1.0.1 or a future to be released version?

cc; @tombuildsstuff

Thanks!

TimLovellSmith · 2018-01-30T19:34:05Z

Hi @bostonmoto I am not familiar with terraform, but I imagine this as most likely manifesting as a bug that deleting a subnet that holds a redis cache would appear to fail, and then need to be retried after a suitable delay, at which time it would succeed.

rem-aj · 2018-01-30T19:42:45Z

Not a problem @TimLovellSmith I will wait for @tombuildsstuff to respond. Just trying to confirm that the creation part works with static IP and subnet. If delete doesn't work we'll just do that manually after as needed.

TimLovellSmith · 2018-02-08T23:57:59Z

@tombuildsstuff I came up with one other idea for allowing workarounds in code that could possibly avoid the need for a new api version:

Could we address this by returning a Azure-AsyncOperation header as part of the DELETE 200 response?

And then you could poll on the URL in that header to wait for the delete work to really be completed. We could probably roll this out faster than a new api version. But would it work for you?

/cc @bostonmoto

tombuildsstuff · 2018-02-09T00:13:13Z

@TimLovellSmith I have a feeling that would break the Azure SDK - @marstr / @mcardosos / @jhendrixMSFT / @joshgav should be able to comment further

@sarangan12 this functionality (internal subnets) currently isn't supported in Terraform since we're waiting for the API to be fixed - once it is we should be able to add support for this relatively quickly (there's a branch with support - but we can't ship it until this issue is resolved)

Thanks!

mcardosos · 2018-02-09T01:48:25Z

Defining the operation as long running operation would introduce breaking changes in the go SDK, yes.

jhendrixMSFT · 2018-02-09T02:24:00Z

But maybe it's the correct thing to do? This came up a few days ago talking with @johanste

mcardosos · 2018-02-09T02:43:51Z

I think it is the correct thing to do. But yes, breaking changes.

johanste · 2018-02-09T03:55:37Z

Since the delete operation is already marked as a long running operation in swagger (from what I can tell), there wouldn't be a breaking change in the Go SDK. I don't know if any other management libraries would be tricked into doing something functionally incorrect by a 200 response w. an AzureAsync-Operation header - but if the operation will take a significant time to complete, usages where the client was fine with fire & forget (doesn't need the resource to actually be deleted), we risk making a call that previously was instantaneous (relatively speaking :)) into a 15 minute call (based on @TimLovellSmith 's earlier comments in the thread). Which would be sub-optimal.

I believe that the long term solution would be to:

Make the redis cache instance stay around with provisioning state == deleting until the resource is actually gone (and all locks on other resources have been released). This will allow clients to figure out which resource is holding the lock on the subnet.
Ensure that the redis cache operation is following the DELETE flow for resource as specified in the resource provider contract specification by including an operation location - which based on my understanding is not quite the case as of today.

In addition, I suspect that 409 for the subnet delete call failure would be better understood by most clients. And possibly (not standard for a 409) include a retry-after header telling the client that it may have better luck in the not-too-distant future.

tombuildsstuff · 2018-03-16T22:03:23Z

@mcardosos could we leave this open until the fix is deployed and verified? At this point the new API is still broken since the fix isn't deployed, based on my reading of this comment: #2581 (comment)

api spec is done here, but server fixes are not yet rolled out

Thanks!

TimLovellSmith · 2018-03-16T22:05:04Z

@tombuildsstuff It is deployed. That comment was several days old.
/cc @mcardosos

TimLovellSmith · 2018-03-20T00:18:03Z

@mcardosos Can we close it again?

mcardosos · 2018-03-20T00:24:13Z

@tombuildsstuff feel free to reopen if not fixed yet

rem-aj · 2018-03-21T13:13:17Z

Hi @tombuildsstuff so we can now not only create azure redis with subnet via terraform but we can also delete it properly now. Is that confirmed? Thanks!

tombuildsstuff · 2018-03-21T13:37:42Z

@bostonmoto we still need to upgrade to the new version of the Azure SDK for Go, and then add rebase the branch with support for this functionality to be able to confirm that it, so it isn't at the moment, but should be soon. I'd suggest subscribing to this issue for updates: hashicorp/terraform-provider-azurerm#82 :)

tombuildsstuff mentioned this issue Jun 26, 2017

Support for setting subnetId and staticIP for Premium sku in Azure Redis Cache hashicorp/terraform-provider-azurerm#82

Closed

azuresdkci assigned sarangan12 Jun 26, 2017

tombuildsstuff mentioned this issue Jun 26, 2017

[Network] Eventual Consistency Bug in the Virtual Network Gateway API #1233

Closed

salameer added the Redis Cache label Jul 5, 2017

sarangan12 assigned TimLovellSmith and hrishi18pathak and unassigned sarangan12 Jul 7, 2017

TimLovellSmith unassigned hrishi18pathak Jan 30, 2018

mcardosos mentioned this issue Feb 9, 2018

Is it possible to write my own poll mechanism? Azure/azure-sdk-for-go#1072

Closed

tombuildsstuff mentioned this issue Mar 7, 2018

New api version for redis 2018-03-01 #2581

Merged

10 tasks

mcardosos closed this as completed in #2581 Mar 16, 2018

mcardosos reopened this Mar 16, 2018

mcardosos closed this as completed Mar 20, 2018

bsiegel added the Service Attention Workflow: This issue is responsible by Azure service team. label Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Redis] Internal Network API Bug #1347

[Redis] Internal Network API Bug #1347

tombuildsstuff commented Jun 26, 2017 •

edited by azuresdkci

Loading

tombuildsstuff commented Jun 26, 2017

sarangan12 commented Jul 7, 2017

tombuildsstuff commented Jul 26, 2017

TimLovellSmith commented Jul 27, 2017

tombuildsstuff commented Sep 25, 2017

TimLovellSmith commented Sep 25, 2017

TimLovellSmith commented Jan 30, 2018

rem-aj commented Jan 30, 2018

TimLovellSmith commented Jan 30, 2018 •

edited

Loading

rem-aj commented Jan 30, 2018

TimLovellSmith commented Feb 8, 2018 •

edited

Loading

tombuildsstuff commented Feb 9, 2018

mcardosos commented Feb 9, 2018

jhendrixMSFT commented Feb 9, 2018

mcardosos commented Feb 9, 2018

johanste commented Feb 9, 2018

tombuildsstuff commented Mar 16, 2018 •

edited

Loading

TimLovellSmith commented Mar 16, 2018 •

edited

Loading

TimLovellSmith commented Mar 20, 2018

mcardosos commented Mar 20, 2018

rem-aj commented Mar 21, 2018

tombuildsstuff commented Mar 21, 2018

[Redis] Internal Network API Bug #1347

[Redis] Internal Network API Bug #1347

Comments

tombuildsstuff commented Jun 26, 2017 • edited by azuresdkci Loading

tombuildsstuff commented Jun 26, 2017

sarangan12 commented Jul 7, 2017

tombuildsstuff commented Jul 26, 2017

TimLovellSmith commented Jul 27, 2017

tombuildsstuff commented Sep 25, 2017

TimLovellSmith commented Sep 25, 2017

TimLovellSmith commented Jan 30, 2018

rem-aj commented Jan 30, 2018

TimLovellSmith commented Jan 30, 2018 • edited Loading

rem-aj commented Jan 30, 2018

TimLovellSmith commented Feb 8, 2018 • edited Loading

tombuildsstuff commented Feb 9, 2018

mcardosos commented Feb 9, 2018

jhendrixMSFT commented Feb 9, 2018

mcardosos commented Feb 9, 2018

johanste commented Feb 9, 2018

tombuildsstuff commented Mar 16, 2018 • edited Loading

TimLovellSmith commented Mar 16, 2018 • edited Loading

TimLovellSmith commented Mar 20, 2018

mcardosos commented Mar 20, 2018

rem-aj commented Mar 21, 2018

tombuildsstuff commented Mar 21, 2018

tombuildsstuff commented Jun 26, 2017 •

edited by azuresdkci

Loading

TimLovellSmith commented Jan 30, 2018 •

edited

Loading

TimLovellSmith commented Feb 8, 2018 •

edited

Loading

tombuildsstuff commented Mar 16, 2018 •

edited

Loading

TimLovellSmith commented Mar 16, 2018 •

edited

Loading