fixes #926 - expose retry information in exceptions #965

jantman · 2016-06-23T12:56:48Z

This is my attempt at fixing #926 by providing information on retries in ResponseMetadata, which is returned as part of the exceptions that are raised to the user.

in botocore.retryhandler.MaxAttemptsDecorator.call(), add a MaxAttemptsReached=True element to ResponseMetadata if the maximum number of attempts is reached
in botocore.endpoint.Endpoint._send_request(), add a NumAttempts element to ResponseMetadata, containing the attempt number

An example of this in action:

#!/usr/bin/env python

import boto3
from mock import patch, Mock
from botocore.vendored.requests.models import Response

ec2 = boto3.resource('ec2')
i = ec2.Instance('i-0')

def mock_get_response(self, request, operation_model, attempts):
    headers = {
        'transfer-encoding': 'chunked',
        'date': 'Thu, 23 Jun 2016 11:32:42 GMT',
        'connection': 'close',
        'server': 'AmazonEC2'
    }
    mock_resp = Mock(spec_set=Response)
    type(mock_resp).content = '<?xml version="1.0" encoding="UTF-8"?>\n<Response><Errors><Error><Type>Sender</Type><Code>Throttling</Code><Message>Rate exceeded</Message></Error></Errors><RequestID>44c0f570-e338-48dd-9953-6684fa586dcb</RequestID></Response>'
    type(mock_resp).headers = headers
    type(mock_resp).status_code = 400
    parsed = {
        'ResponseMetadata': {
            'HTTPStatusCode': 400,
            'RequestId': '44c0f570-e338-48dd-9953-6684fa586dcb',
            'HTTPHeaders': headers
        },
        'Error': {
            'Message': "Rate exceeded",
            'Code': 'Throttling'
        }
    }
    return (mock_resp, parsed), None


with patch('botocore.endpoint.Endpoint._get_response', side_effect=mock_get_response, autospec=True):
    try:
        x = i.hypervisor
    except Exception as ex:
        print(ex.response)

(botocore)jantman@phoenix:pts/14:~/GIT/botocore (issue926_retry_info %=)$ time python ~/tmp/test.py 
{'ResponseMetadata': {'NumAttempts': 5, 'HTTPStatusCode': 400, 'MaxAttemptsReached': True, 'RequestId': '44c0f570-e338-48dd-9953-6684fa586dcb', 'HTTPHeaders': {'transfer-encoding': 'chunked', 'date': 'Thu, 23 Jun 2016 11:32:42 GMT', 'connection': 'close', 'server': 'AmazonEC2'}}, 'Error': {'Message': 'Rate exceeded', 'Code': 'Throttling'}}

real    0m13.479s
user    0m0.340s
sys     0m0.043s

codecov-io · 2016-06-23T13:02:27Z

Current coverage is 97.32% (diff: 100%)

Merging #965 into develop will increase coverage by <.01%

@@            develop       #965   diff @@
==========================================
  Files            44         44          
  Lines          7140       7147     +7   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6949       6956     +7   
  Misses          191        191          
  Partials          0          0

Powered by Codecov. Last update d87049c...b44d22b

JordonPhillips · 2016-06-23T21:08:09Z

botocore/endpoint.py

@@ -155,6 +155,11 @@ def _send_request(self, request_dict, operation_model):
                request_dict, operation_model)
            success_response, exception = self._get_response(
                request, operation_model, attempts)
+        # this *should* be None or a tuple
+        if isinstance(success_response, type(())) and len(success_response) > 1:


I think it would be better if you just check for success_response == None

Will do, I just wasn't sure how many assumptions I could make about success_response.

JordonPhillips · 2016-06-23T21:19:48Z

I think I would rather have a list of reasons we retried instead of the number of attempts. Something like:

success_response['ResponseMetadata']['Retries'] = [
    {
        'Code': '400',
        'Message': 'Throttling'
    },
    {
        'Code': '500',
        'Message': 'Server Error'
    }
]

It gives the caller more information about why we retried. I'd like to see what others think.

jantman · 2016-06-24T11:52:10Z

@JordonPhillips updated per your suggestions

JordonPhillips · 2016-06-27T16:33:29Z

Awesome! Two more minor bits: Could you please squash your commits? Also, you seem to have changed the file permissions on test_retryhandler.py from 644 to 755, would you mind changing that back?

jamesls · 2016-06-27T16:40:55Z

With this new change, is MaxAttemptsReached replaced with an implicit check of len(Retries) == MaxAttempts? I liked the explicitness of MaxAttemptsReached, it was something that was called out in #926 as not being obvious, and something I've heard before from others. What do you think of adding that back?

I'd also like to see tests that show that we created the list/dict after retrying.

JordonPhillips · 2016-06-27T16:45:26Z

@jamesls Ah, I didn't catch that. Yeah, I think having MaxAttemtsReached as a boolean is valuable. I don't think customers need to know what the max attempts value is explicitly. It's more useful just to know if it was hit.

jantman · 2016-06-29T18:17:11Z

Ok. Sorry for leaving this for a few days, but I'll try and work up those changes before the end of the week.

jantman · 2016-06-30T10:03:42Z

@JordonPhillips perms fixed. I'll re-push and squash once you're both OK with my changes - I really don't like rewriting or discarding history until I know I'm done with it. (Side note - not sure what workflow you use, but you know you can squash merge through the PR UI now...)

@jamesls Ok, I'll add back the boolean. Sorry for my confusion. I'll do my best on the tests, but frankly, I'm horribly confused by how the testing is done in botocore, and it seems that endpoint has very little existing coverage. I'll try my best.

Also, FWIW, if you both really want it removed I'll do it, but I think that knowing the max attempts value can be useful for some people, and isn't a lot of overhead to keep in place. For people who are having issues like me, with throttling/rate limiting, it's very useful information to log.

jantman · 2016-06-30T11:01:39Z

@JordonPhillips @jamesls I've added a test for the retry errors list, though it's not exactly the style of the other tests. Apologies, but I spent the better part of the last hour trying to make sense of how the rest of the endpoint tests are written, but it's really outside my experience. If this isn't sufficient, I'll rework it but need some guidance. If this is good, I'll squash down all the commits.

JordonPhillips · 2016-06-30T20:32:22Z

@jantman I think that max retries is only really useful when in development. I'm not so sure that it's useful during normal execution. What is your use case?

jantman · 2016-07-01T19:15:52Z

@JordonPhillips My use case was based on #882 / #891 and the assumption that at some point the number of retries will be configurable by the client.

At my organization, we have a relatively small number of AWS accounts, and a LOT of applications deployed in them. During the day, it's not unusual to have dozens, perhaps hundreds, of people or applications making API requests simultaneously.

I feel like it's pretty likely, at least at my organization and other heavy AWS users, that (once #882 is fixed) I'd probably make it normal practice to log (albeit at debug-level) the number of failures and the maximum number of retries, for use when debugging issues like this.

If you think it's too internal/too deep in the code, I can remove it and update the tests.

jantman · 2016-07-13T12:17:04Z

@JordonPhillips @jamesls Do you feel that I should remove the max retries?

JordonPhillips · 2016-07-13T16:13:11Z

@jantman I'm in favor of removing it until it is configurable.

jantman · 2016-07-13T16:16:56Z

Will do. I'll update the PR later today.

jantman · 2016-07-14T11:50:33Z

@JordonPhillips max retries has been removed. If everyone's good with this, I'll squash it down to one commit and re-push.

jantman · 2016-07-19T10:39:21Z

@JordonPhillips @jamesls is this good to squash down?

jantman · 2016-08-03T18:02:08Z

@jamesls @JordonPhillips ping?

jantman · 2016-08-15T11:27:28Z

@jamesls @JordonPhillips ok, I've re-pushed this branch squashed down to one commit. If needed, the full history is still intact at https://github.com/jantman/botocore/tree/issue926_retry_info_unsquashed

jantman · 2016-08-30T17:51:29Z

@jamesls @JordonPhillips It's been a month and a half... can I please get some feedback on the status of this PR?

jamesls · 2016-08-30T22:35:25Z

Sorry for the delay, looking at this now.

jamesls · 2016-08-30T23:58:11Z

Spoke with @kyleknap and @JordonPhillips about this and there's a few things worth noting:

We've added a few things to ResponseMetadata since this PR was first sent, most notably response headers, and we're concerned about adding more things to ResponseMetadata, particularly something like a list of retries. When we eventually support user configured retries, this number could be potentially large. There's also no way for a user to opt-out of getting all this extra data.
The boolean in the response metadata is helpful if you're looking at response metadata. We were also considering updating the error message in the ClientError to specify that the max number of retry attempts was reached so that people just catching and printing ClientError would know that max retry attempts were reached.

For people that want more insight into what's being retried, I think it makes sense to update our events so that people can write custom event handlers that can track retry attempts however they see fit, but that would be a separate pull request.

So to summarize:

remove Retry list in response metadata
update ClientError message if max retries were hit to indicate we tried N amount of times.

If those changes seem reasonable, I'll go ahead and pull in your changes, make the updates above, and resend the pull request.

Does that seem reasonable?

jantman · 2016-09-01T00:13:45Z

@jamesls Apologies for the delay on my end, I was out of the office today.

While I think the information could be valuable, I'll agree that the list of retries in response metadata could potentially add a lot to that dict, for an edge case. I suppose if I take a moment to think about it, while my edge case would find it useful, I can understand as a maintainer of boto not to want to include that.

This all sounds reasonable to me. While my initial concern had been to aid in tuning max retries (i.e. via a ~/.aws/models/_retry.json or via a future parameter exposed for that) and collect data, this will suffice to surface the error to the user and make it clear that the request was retried.

Unless you get to it first, I'll make those changes and push back up sometime tomorrow morning or afternoon GMT-4.

jantman · 2016-09-01T00:15:11Z

Sorry, somehow I missed seeing the other PR.

@jamesls I'm in favor of closing this in favor of #1024. Thank you so much for your assistance with this!

jamesls · 2016-09-01T00:28:32Z

Good to hear. Thanks again for the pull request.

This was referenced Jun 23, 2016

Provide caller with information on raised exception after retry #926

Closed

Catch boto RequestLimitExceeded and Possibly Add Retries jantman/awslimitchecker#187

Closed

JordonPhillips reviewed Jun 23, 2016
View reviewed changes

JordonPhillips added pr/needs-review This PR needs a review from a member of the team. needs-discussion labels Jun 23, 2016

fixes boto#926 - expose retry information in exceptions

b44d22b

jantman force-pushed the issue926_retry_info branch from 935c3a2 to b44d22b Compare August 15, 2016 11:26

jamesls mentioned this pull request Aug 31, 2016

Add retry information to ResponseMetadata #1024

Merged

jamesls closed this Sep 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixes #926 - expose retry information in exceptions #965

fixes #926 - expose retry information in exceptions #965

jantman commented Jun 23, 2016

codecov-io commented Jun 23, 2016 •

edited

Loading

JordonPhillips Jun 23, 2016

jantman Jun 24, 2016

JordonPhillips commented Jun 23, 2016 •

edited

Loading

jantman commented Jun 24, 2016

JordonPhillips commented Jun 27, 2016

jamesls commented Jun 27, 2016

JordonPhillips commented Jun 27, 2016

jantman commented Jun 29, 2016

jantman commented Jun 30, 2016 •

edited

Loading

jantman commented Jun 30, 2016

JordonPhillips commented Jun 30, 2016

jantman commented Jul 1, 2016

jantman commented Jul 13, 2016

JordonPhillips commented Jul 13, 2016

jantman commented Jul 13, 2016

jantman commented Jul 14, 2016

jantman commented Jul 19, 2016

jantman commented Aug 3, 2016

jantman commented Aug 15, 2016

jantman commented Aug 30, 2016

jamesls commented Aug 30, 2016

jamesls commented Aug 30, 2016

jantman commented Sep 1, 2016

jantman commented Sep 1, 2016

jamesls commented Sep 1, 2016

fixes #926 - expose retry information in exceptions #965

fixes #926 - expose retry information in exceptions #965

Conversation

jantman commented Jun 23, 2016

codecov-io commented Jun 23, 2016 • edited Loading

Current coverage is 97.32% (diff: 100%)

JordonPhillips Jun 23, 2016

Choose a reason for hiding this comment

jantman Jun 24, 2016

Choose a reason for hiding this comment

JordonPhillips commented Jun 23, 2016 • edited Loading

jantman commented Jun 24, 2016

JordonPhillips commented Jun 27, 2016

jamesls commented Jun 27, 2016

JordonPhillips commented Jun 27, 2016

jantman commented Jun 29, 2016

jantman commented Jun 30, 2016 • edited Loading

jantman commented Jun 30, 2016

JordonPhillips commented Jun 30, 2016

jantman commented Jul 1, 2016

jantman commented Jul 13, 2016

JordonPhillips commented Jul 13, 2016

jantman commented Jul 13, 2016

jantman commented Jul 14, 2016

jantman commented Jul 19, 2016

jantman commented Aug 3, 2016

jantman commented Aug 15, 2016

jantman commented Aug 30, 2016

jamesls commented Aug 30, 2016

jamesls commented Aug 30, 2016

jantman commented Sep 1, 2016

jantman commented Sep 1, 2016

jamesls commented Sep 1, 2016

codecov-io commented Jun 23, 2016 •

edited

Loading

JordonPhillips commented Jun 23, 2016 •

edited

Loading

jantman commented Jun 30, 2016 •

edited

Loading