Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate options for default per-operation timeout #7484

Closed
bterlson opened this issue Feb 21, 2020 · 3 comments
Closed

Investigate options for default per-operation timeout #7484

bterlson opened this issue Feb 21, 2020 · 3 comments
Assignees
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.

Comments

@bterlson
Copy link
Member

bterlson commented Feb 21, 2020

Some users have observed that when our client libraries send a request but a response never comes back, the behavior is less than desirable. Namely, you get a ECONNRESET sort of error after many minutes, and no retries occur.

We should investigate potential solutions to this problem, including:

  • Adding a default, retryable, per-operation default timeout. Strawman: 20 seconds. This default could be overridden in the convenience layer on a per-operation basis.
  • Exposing timeout in operation options to allow users to customize the per-operation timeout.

It was observed that there is a semantic difference between the following, which I don't believe we're making in Storage today:

  1. client.get( { timeout: 1000 });
  2. client.get( { abortSignal: AbortController.timeout(1000) });

For one, we want to retry. For two, we don't.

/cc @chradek @mikeharder

@xirzec xirzec added Azure.Core Client This issue points to a problem in the data-plane of the library. labels Feb 21, 2020
@xirzec xirzec self-assigned this Feb 21, 2020
@xirzec xirzec added this to the Backlog milestone Feb 21, 2020
@xirzec
Copy link
Member

xirzec commented Jul 15, 2021

There seem to be a few different concerns in this issue.

First: should we have a default timeout and I think the answer here is no, we should let the platform decide when to timeout. Otherwise, we'll have to be very careful that long/slow operations have their timeout bumped up, and since swagger doesn't give us that information, it would be manual and error-prone.

Second: exposing these controls to the consumer: OperationRequestOptions has a timeout property, which we already honor today and OperationOptions has an abortSignal. So it's all good, any client operation can be configured by the consumer.

Third: Retry handling for Abort vs Timeout. This is a little more complicated.

For the browser, this raises a "TimeoutError" DOMException, which we don't seem to handle at all, even in systemErrorRetryPolicy. This work was tracked by #9568 which has been closed in favor of moving to fetch in the browser.

In the Node case, we actually use an AbortSignal to do the cancelation of the request (e.g. it fails with an AbortError instead of a timeout). We could change this and leverage the socket timeout options instead, using the request's "timeout" event if we wanted to fail the request with a better error or be able to distinguish it inside exponentialRetryPolicy.

In any case, it doesn't appear that we retry when a request hits the user-configurable timeout. The only time we retry is when we hit one of the following Node networking errors:

err.code === "ETIMEDOUT" ||
err.code === "ESOCKETTIMEDOUT" ||
err.code === "ECONNREFUSED" ||
err.code === "ECONNRESET" ||
err.code === "ENOENT"

Which appear to be system-dependant.

@bterlson / @ramya-rao-a - it seems like the only work here could be throwing a better timeout error in the Node case and making the retry policy be able to retry when we hit a user timeout. Is this worth doing this semester?

@bterlson
Copy link
Member Author

Agree with the analysis - timeout property should be retried, the fact that it isn't seems like a bug. Absent any signals from github, I can't comment on the importance of this bug relative to other work planned this semester. @ramya-rao-a ?

Copy link

Hi @bterlson, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 22, 2024
@xirzec xirzec removed this from the Backlog milestone May 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

No branches or pull requests

2 participants