Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #926 - expose retry information in exceptions #965

Closed
wants to merge 1 commit into from

Conversation

jantman
Copy link
Contributor

@jantman jantman commented Jun 23, 2016

This is my attempt at fixing #926 by providing information on retries in ResponseMetadata, which is returned as part of the exceptions that are raised to the user.

  • in botocore.retryhandler.MaxAttemptsDecorator.call(), add a MaxAttemptsReached=True element to ResponseMetadata if the maximum number of attempts is reached
  • in botocore.endpoint.Endpoint._send_request(), add a NumAttempts element to ResponseMetadata, containing the attempt number

An example of this in action:

#!/usr/bin/env python

import boto3
from mock import patch, Mock
from botocore.vendored.requests.models import Response

ec2 = boto3.resource('ec2')
i = ec2.Instance('i-0')

def mock_get_response(self, request, operation_model, attempts):
    headers = {
        'transfer-encoding': 'chunked',
        'date': 'Thu, 23 Jun 2016 11:32:42 GMT',
        'connection': 'close',
        'server': 'AmazonEC2'
    }
    mock_resp = Mock(spec_set=Response)
    type(mock_resp).content = '<?xml version="1.0" encoding="UTF-8"?>\n<Response><Errors><Error><Type>Sender</Type><Code>Throttling</Code><Message>Rate exceeded</Message></Error></Errors><RequestID>44c0f570-e338-48dd-9953-6684fa586dcb</RequestID></Response>'
    type(mock_resp).headers = headers
    type(mock_resp).status_code = 400
    parsed = {
        'ResponseMetadata': {
            'HTTPStatusCode': 400,
            'RequestId': '44c0f570-e338-48dd-9953-6684fa586dcb',
            'HTTPHeaders': headers
        },
        'Error': {
            'Message': "Rate exceeded",
            'Code': 'Throttling'
        }
    }
    return (mock_resp, parsed), None


with patch('botocore.endpoint.Endpoint._get_response', side_effect=mock_get_response, autospec=True):
    try:
        x = i.hypervisor
    except Exception as ex:
        print(ex.response)
(botocore)jantman@phoenix:pts/14:~/GIT/botocore (issue926_retry_info %=)$ time python ~/tmp/test.py 
{'ResponseMetadata': {'NumAttempts': 5, 'HTTPStatusCode': 400, 'MaxAttemptsReached': True, 'RequestId': '44c0f570-e338-48dd-9953-6684fa586dcb', 'HTTPHeaders': {'transfer-encoding': 'chunked', 'date': 'Thu, 23 Jun 2016 11:32:42 GMT', 'connection': 'close', 'server': 'AmazonEC2'}}, 'Error': {'Message': 'Rate exceeded', 'Code': 'Throttling'}}

real    0m13.479s
user    0m0.340s
sys     0m0.043s

@codecov-io
Copy link

codecov-io commented Jun 23, 2016

Current coverage is 97.32% (diff: 100%)

Merging #965 into develop will increase coverage by <.01%

@@            develop       #965   diff @@
==========================================
  Files            44         44          
  Lines          7140       7147     +7   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6949       6956     +7   
  Misses          191        191          
  Partials          0          0          

Powered by Codecov. Last update d87049c...b44d22b

@@ -155,6 +155,11 @@ def _send_request(self, request_dict, operation_model):
request_dict, operation_model)
success_response, exception = self._get_response(
request, operation_model, attempts)
# this *should* be None or a tuple
if isinstance(success_response, type(())) and len(success_response) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if you just check for success_response == None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, I just wasn't sure how many assumptions I could make about success_response.

@JordonPhillips
Copy link
Contributor

JordonPhillips commented Jun 23, 2016

I think I would rather have a list of reasons we retried instead of the number of attempts. Something like:

success_response['ResponseMetadata']['Retries'] = [
    {
        'Code': '400',
        'Message': 'Throttling'
    },
    {
        'Code': '500',
        'Message': 'Server Error'
    }
]

It gives the caller more information about why we retried. I'd like to see what others think.

@JordonPhillips JordonPhillips added pr/needs-review This PR needs a review from a member of the team. needs-discussion labels Jun 23, 2016
@jantman
Copy link
Contributor Author

jantman commented Jun 24, 2016

@JordonPhillips updated per your suggestions

@JordonPhillips
Copy link
Contributor

Awesome! Two more minor bits: Could you please squash your commits? Also, you seem to have changed the file permissions on test_retryhandler.py from 644 to 755, would you mind changing that back?

@jamesls
Copy link
Member

jamesls commented Jun 27, 2016

With this new change, is MaxAttemptsReached replaced with an implicit check of len(Retries) == MaxAttempts? I liked the explicitness of MaxAttemptsReached, it was something that was called out in #926 as not being obvious, and something I've heard before from others. What do you think of adding that back?

I'd also like to see tests that show that we created the list/dict after retrying.

@JordonPhillips
Copy link
Contributor

@jamesls Ah, I didn't catch that. Yeah, I think having MaxAttemtsReached as a boolean is valuable. I don't think customers need to know what the max attempts value is explicitly. It's more useful just to know if it was hit.

@jantman
Copy link
Contributor Author

jantman commented Jun 29, 2016

Ok. Sorry for leaving this for a few days, but I'll try and work up those changes before the end of the week.

@jantman
Copy link
Contributor Author

jantman commented Jun 30, 2016

@JordonPhillips perms fixed. I'll re-push and squash once you're both OK with my changes - I really don't like rewriting or discarding history until I know I'm done with it. (Side note - not sure what workflow you use, but you know you can squash merge through the PR UI now...)

@jamesls Ok, I'll add back the boolean. Sorry for my confusion. I'll do my best on the tests, but frankly, I'm horribly confused by how the testing is done in botocore, and it seems that endpoint has very little existing coverage. I'll try my best.

Also, FWIW, if you both really want it removed I'll do it, but I think that knowing the max attempts value can be useful for some people, and isn't a lot of overhead to keep in place. For people who are having issues like me, with throttling/rate limiting, it's very useful information to log.

@jantman
Copy link
Contributor Author

jantman commented Jun 30, 2016

@JordonPhillips @jamesls I've added a test for the retry errors list, though it's not exactly the style of the other tests. Apologies, but I spent the better part of the last hour trying to make sense of how the rest of the endpoint tests are written, but it's really outside my experience. If this isn't sufficient, I'll rework it but need some guidance. If this is good, I'll squash down all the commits.

@JordonPhillips
Copy link
Contributor

@jantman I think that max retries is only really useful when in development. I'm not so sure that it's useful during normal execution. What is your use case?

@jantman
Copy link
Contributor Author

jantman commented Jul 1, 2016

@JordonPhillips My use case was based on #882 / #891 and the assumption that at some point the number of retries will be configurable by the client.

At my organization, we have a relatively small number of AWS accounts, and a LOT of applications deployed in them. During the day, it's not unusual to have dozens, perhaps hundreds, of people or applications making API requests simultaneously.

I feel like it's pretty likely, at least at my organization and other heavy AWS users, that (once #882 is fixed) I'd probably make it normal practice to log (albeit at debug-level) the number of failures and the maximum number of retries, for use when debugging issues like this.

If you think it's too internal/too deep in the code, I can remove it and update the tests.

@jantman
Copy link
Contributor Author

jantman commented Jul 13, 2016

@JordonPhillips @jamesls Do you feel that I should remove the max retries?

@JordonPhillips
Copy link
Contributor

@jantman I'm in favor of removing it until it is configurable.

@jantman
Copy link
Contributor Author

jantman commented Jul 13, 2016

Will do. I'll update the PR later today.

@jantman
Copy link
Contributor Author

jantman commented Jul 14, 2016

@JordonPhillips max retries has been removed. If everyone's good with this, I'll squash it down to one commit and re-push.

@jantman
Copy link
Contributor Author

jantman commented Jul 19, 2016

@JordonPhillips @jamesls is this good to squash down?

@jantman
Copy link
Contributor Author

jantman commented Aug 3, 2016

@jamesls @JordonPhillips ping?

@jantman
Copy link
Contributor Author

jantman commented Aug 15, 2016

@jamesls @JordonPhillips ok, I've re-pushed this branch squashed down to one commit. If needed, the full history is still intact at https://github.com/jantman/botocore/tree/issue926_retry_info_unsquashed

@jantman
Copy link
Contributor Author

jantman commented Aug 30, 2016

@jamesls @JordonPhillips It's been a month and a half... can I please get some feedback on the status of this PR?

@jamesls
Copy link
Member

jamesls commented Aug 30, 2016

Sorry for the delay, looking at this now.

@jamesls
Copy link
Member

jamesls commented Aug 30, 2016

Spoke with @kyleknap and @JordonPhillips about this and there's a few things worth noting:

  • We've added a few things to ResponseMetadata since this PR was first sent, most notably response headers, and we're concerned about adding more things to ResponseMetadata, particularly something like a list of retries. When we eventually support user configured retries, this number could be potentially large. There's also no way for a user to opt-out of getting all this extra data.
  • The boolean in the response metadata is helpful if you're looking at response metadata. We were also considering updating the error message in the ClientError to specify that the max number of retry attempts was reached so that people just catching and printing ClientError would know that max retry attempts were reached.

For people that want more insight into what's being retried, I think it makes sense to update our events so that people can write custom event handlers that can track retry attempts however they see fit, but that would be a separate pull request.

So to summarize:

  • remove Retry list in response metadata
  • update ClientError message if max retries were hit to indicate we tried N amount of times.

If those changes seem reasonable, I'll go ahead and pull in your changes, make the updates above, and resend the pull request.

Does that seem reasonable?

@jantman
Copy link
Contributor Author

jantman commented Sep 1, 2016

@jamesls Apologies for the delay on my end, I was out of the office today.

While I think the information could be valuable, I'll agree that the list of retries in response metadata could potentially add a lot to that dict, for an edge case. I suppose if I take a moment to think about it, while my edge case would find it useful, I can understand as a maintainer of boto not to want to include that.

This all sounds reasonable to me. While my initial concern had been to aid in tuning max retries (i.e. via a ~/.aws/models/_retry.json or via a future parameter exposed for that) and collect data, this will suffice to surface the error to the user and make it clear that the request was retried.

Unless you get to it first, I'll make those changes and push back up sometime tomorrow morning or afternoon GMT-4.

@jantman
Copy link
Contributor Author

jantman commented Sep 1, 2016

Sorry, somehow I missed seeing the other PR.

@jamesls I'm in favor of closing this in favor of #1024. Thank you so much for your assistance with this!

@jamesls
Copy link
Member

jamesls commented Sep 1, 2016

Good to hear. Thanks again for the pull request.

@jamesls jamesls closed this Sep 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion pr/needs-review This PR needs a review from a member of the team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants