Investigate options for default per-operation timeout #7484

bterlson · 2020-02-21T00:06:53Z

Some users have observed that when our client libraries send a request but a response never comes back, the behavior is less than desirable. Namely, you get a ECONNRESET sort of error after many minutes, and no retries occur.

We should investigate potential solutions to this problem, including:

Adding a default, retryable, per-operation default timeout. Strawman: 20 seconds. This default could be overridden in the convenience layer on a per-operation basis.
Exposing timeout in operation options to allow users to customize the per-operation timeout.

It was observed that there is a semantic difference between the following, which I don't believe we're making in Storage today:

client.get( { timeout: 1000 });
client.get( { abortSignal: AbortController.timeout(1000) });

For one, we want to retry. For two, we don't.

/cc @chradek @mikeharder

The text was updated successfully, but these errors were encountered:

xirzec · 2021-07-15T21:03:06Z

There seem to be a few different concerns in this issue.

First: should we have a default timeout and I think the answer here is no, we should let the platform decide when to timeout. Otherwise, we'll have to be very careful that long/slow operations have their timeout bumped up, and since swagger doesn't give us that information, it would be manual and error-prone.

Second: exposing these controls to the consumer: OperationRequestOptions has a timeout property, which we already honor today and OperationOptions has an abortSignal. So it's all good, any client operation can be configured by the consumer.

Third: Retry handling for Abort vs Timeout. This is a little more complicated.

For the browser, this raises a "TimeoutError" DOMException, which we don't seem to handle at all, even in systemErrorRetryPolicy. This work was tracked by #9568 which has been closed in favor of moving to fetch in the browser.

In the Node case, we actually use an AbortSignal to do the cancelation of the request (e.g. it fails with an AbortError instead of a timeout). We could change this and leverage the socket timeout options instead, using the request's "timeout" event if we wanted to fail the request with a better error or be able to distinguish it inside exponentialRetryPolicy.

In any case, it doesn't appear that we retry when a request hits the user-configurable timeout. The only time we retry is when we hit one of the following Node networking errors:

err.code === "ETIMEDOUT" ||
err.code === "ESOCKETTIMEDOUT" ||
err.code === "ECONNREFUSED" ||
err.code === "ECONNRESET" ||
err.code === "ENOENT"

Which appear to be system-dependant.

@bterlson / @ramya-rao-a - it seems like the only work here could be throwing a better timeout error in the Node case and making the retry policy be able to retry when we hit a user timeout. Is this worth doing this semester?

bterlson · 2021-07-19T21:59:02Z

Agree with the analysis - timeout property should be retried, the fact that it isn't seems like a bug. Absent any signals from github, I can't comment on the importance of this bug relative to other work planned this semester. @ramya-rao-a ?

github-actions · 2024-03-22T18:07:03Z

Hi @bterlson, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

triage-new-issues bot added the triage label Feb 21, 2020

xirzec added Azure.Core Client This issue points to a problem in the data-plane of the library. labels Feb 21, 2020

triage-new-issues bot removed the triage label Feb 21, 2020

xirzec self-assigned this Feb 21, 2020

xirzec added this to the Backlog milestone Feb 21, 2020

xirzec mentioned this issue Apr 24, 2020

[Epic][core-http] Planned improvements / redesigns #8535

Closed

35 tasks

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2024

github-actions bot locked and limited conversation to collaborators Mar 22, 2024

xirzec removed this from the Backlog milestone May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate options for default per-operation timeout #7484

Investigate options for default per-operation timeout #7484

bterlson commented Feb 21, 2020 •

edited

Loading

xirzec commented Jul 15, 2021

bterlson commented Jul 19, 2021

github-actions bot commented Mar 22, 2024

Investigate options for default per-operation timeout #7484

Investigate options for default per-operation timeout #7484

Comments

bterlson commented Feb 21, 2020 • edited Loading

xirzec commented Jul 15, 2021

bterlson commented Jul 19, 2021

github-actions bot commented Mar 22, 2024

bterlson commented Feb 21, 2020 •

edited

Loading