Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net, os: Set*Deadline() expiration error should be unique, as .Timeout() is true for keepalive, etc #31449

Closed
networkimprov opened this issue Apr 12, 2019 · 59 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker
Milestone

Comments

@networkimprov
Copy link

networkimprov commented Apr 12, 2019

This is a bug. A keepalive error is a connection failure, not a deadline event.

net Docs say:

A zero value for [deadline] means I/O operations will not time out.

A zero value for [read deadline] means Read will not time out.

KeepAlive specifies the keep-alive period for an active network connection.
If zero, keep-alives are enabled if supported by the protocol and operating system.
Network protocols or operating systems that do not support keep-alives ignore this field.
If negative, keep-alives are disabled.

For a TLS connection that's been severed, Conn.Read() returns a net.Error with .Timeout()==true due to KeepAlive failure. (Go 1.12, Linux, amd64)

The Error should give .Timeout()==false to comply with the docs. Code that checks for .Timeout()==true would generally assume that an explicit deadline had passed.

The .Error() string should mention keepalive. It's currently:
"read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out"

Related: net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn't documented in package net.

cc @bradfitz

@networkimprov
Copy link
Author

Also filed #31490 to report that tls.DialWithDialer() doesn't respect .KeepAlive.

@CAFxX
Copy link
Contributor

CAFxX commented Apr 17, 2019

Since 1.11, net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn't documented in package net.

It is documented that we enable keep-alives if that field is unset/zero. The choice of not specifying 15s in the docs was deliberate and is actually documented in the commit message: 5bd7e9c

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 22, 2019
@julieqiu julieqiu added this to the Go1.13 milestone Apr 22, 2019
@networkimprov
Copy link
Author

I believe this is a new bug in 1.12, so a fix should be backported to a 1.12.x release.
Also reported in #32735

cc @ianlancetaylor
@gopherbot add release-blocker

@ianlancetaylor
Copy link
Contributor

Do you know what makes this new in 1.12? I'm not seeing it.

@networkimprov
Copy link
Author

This decided on keepalive by default in 1.12: #23459. That causes deadline handlers in existing code to see non-deadline (i.e. keepalive) errors.

@ianlancetaylor
Copy link
Contributor

If I understand you correctly, you are saying that the bug has existed for a long time for programs that enable TCP keepalive by setting the KeepAlive field in net.Dialer, but that it is more likely to occur in 1.12 because now that field is set by default. Is that correct?

@networkimprov
Copy link
Author

Yes. It's probably rare to use both deadlines and explicit keepalive, so the bug wasn't reported.

Now any code with long deadlines (relatively common) will wrongly detect deadline events due to implicit keepalive.

@FiloSottile
Copy link
Contributor

Go 1.13 also enables Keep-Alives by default on the net.Listen side (1abf3aa), so this might be worth fixing now, before exposing a new wave of applications to it.

@networkimprov
Copy link
Author

networkimprov commented Jul 16, 2019

@costela, your keepalive patch will trigger this bug...

@gopherbot please open backport 1.12

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #33137 (for 1.12).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@costela
Copy link
Contributor

costela commented Jul 16, 2019

The Error should give .Timeout()==false to comply with the docs. Code that checks for .Timeout()==true would generally assume that an explicit deadline had passed.

I can't seem to find anything in the docs saying that a keepalive timeout should not set .Timeout()==true. IMHO this is not necessarily obvious and should be clarified in the docs if it's indeed intended.

This isn't to say the new default behavior wouldn't introduce a potentially breaking change, but I'm not sure changing the error to be .Timeout()==false is the right approach. Just as there might be code depending on .Timeout()==true for detecting deadlines, there might be code depending on the same behavior for explicitly set keepalives. Or am I missing something obvious?

@networkimprov
Copy link
Author

Docs say "A zero value for [deadline] means I/O operations will not time out."

Don't worry about it; it's been assigned. Sorry to bother you.

@FiloSottile
Copy link
Contributor

Deadlines and keep-alive errors are deeply different: the former are fully recoverable, the latter aren't, for example. It seems unlikely any code would ever want to handle them the same way.

Keep-alives are more akin to the connection dropping, so I don't think marking them as Timeouts makes sense. I'll make this change tomorrow.

@FiloSottile FiloSottile added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 17, 2019
@networkimprov
Copy link
Author

@FiloSottile, let's make the .Error() string mention keep-alive. It's currently:
"read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out"

Also I opened a 1.12.x backport issue.

@FiloSottile FiloSottile added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Jul 23, 2019
@FiloSottile
Copy link
Contributor

I looked into this, and as far as I can tell, there is no way to single out keep-alive errors: they just surface a ETIMEDOUT from the read().

What we might do, is isolate deadline errors, which as far as I can tell (although I am no expert on that part of the code) are handled by runtime timers, and make those more uniquely recognizable. For example, by making them an exported error type.

We might even decide to make only those have Timeout() true. Currently Timeout() is true for e == EAGAIN || e == EWOULDBLOCK || e == ETIMEDOUT and for some DNS resolution errors. That line has last been touched in 2011, and I don't feel confident about how each of those ids map to a timeout.

If I'm right, this is not a small fix to a specific error value anymore, and I don't think it's something we should ship at the very end of the freeze. /cc @golang/osp-team

@ianlancetaylor
Copy link
Contributor

I believe that if a read from a network connection fails due to a deadline, the read will return internal/poll.ErrTimeout. I believe that if a read fails due to a keep-alive error, the read will return syscall.ETIMEDOUT.

@networkimprov
Copy link
Author

Ian, can we make net.OpError.Timeout() check internal/poll.ErrTimeout?

Altho we could undo #23378 for 1.13, we still need to fix this in 1.12.x & 1.13 due to #23459.

@ianlancetaylor
Copy link
Contributor

Seems a fair bit simpler to change internal/poll.ErrTimeout to not return true for the Timeout method. Or just not implement the Timeout method. Might be a good idea to change the name, of course.

@rsc
Copy link
Contributor

rsc commented Oct 9, 2019

Back in #31449 (comment), I wrote:

For Go 1.14 maybe the answer is to change our net.Conn implementations to return an error that Is(context.DeadlineExceeded), to allow a more precise check than Timeout.

We did not get into this, and we are close to the freeze. It seems like we should probably put this off to next cycle. But I will retitle so it is easier to understand what this is about.

@rsc rsc changed the title net: Conn.Read() returns Error with .Timeout()==true on KeepAlive failure net: document that Conn.Read/Write should return error that Is(context.DeadlineExceeded) for deadline exceeded Oct 9, 2019
@rsc rsc modified the milestones: Go1.14, Go1.15 Oct 9, 2019
@ianlancetaylor
Copy link
Contributor

Above @rsc suggests that we change net.Conn.Read/Write to return context.DeadlineExceeded if they time out due to exceeding a deadline. That will change the error when printed from i/o timeout to context deadline exceeded. I don't think that would be an ideal change, as I think it is confusing to refer to a context when no context is involved.

I suggest that we instead add net.ErrDeadlineExceeded and os.ErrDeadlineExceeded and return those.

@ianlancetaylor
Copy link
Contributor

Actually, let's just add os.ErrDeadlineExceeded.

@networkimprov
Copy link
Author

networkimprov commented Apr 17, 2020

Maybe go vet should flag use of err.Timeout() after Set*Deadline(), since that is not a reliable way to check for deadline-expired.

Why is this error related to package os? Aren't deadlines a net.Conn concept?

EDIT: I forgot that deadlines also exist for os.File.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/228644 mentions this issue: net: return context.DeadlineExceeded if past context deadline

@ianlancetaylor
Copy link
Contributor

Any change to "go vet" can be a separate issue. Although I suspect that that change is not feasible as it would make some amount of existing code non-vet-compliant, which we can't really do.

The os package has SetDeadline just as the net package does.

@networkimprov
Copy link
Author

Could net.ErrDeadlineExceeded be added as an alias to os.ErrDeadlineExceeded?

Filed #38508 for the vet issue.

@ianlancetaylor
Copy link
Contributor

Sure, we could have both net.ErrDeadlineExceeded and os.ErrDeadlineExceeded, but I don't see a reason to do that. The net package already depends on the os package. I'm open to it if there is a good reason for it.

@networkimprov
Copy link
Author

It avoids the need to import "os" when calling net.Conn.Set*Deadline(). Your first instinct was to provide it :-)

Almost all code using Set*Deadline() now needs to be updated. It's odd that those changes should require an import unless "os" is already imported.

@ianlancetaylor
Copy link
Contributor

My guess is that very little code that uses SetDeadline will need to be updated. The only code that needs to be updated is code that needs to reliably determine whether the connection failed due to an exceeded deadline. Most programs will just see that the connection failed and carry on.

I would like to hear someone else's opinion on this question.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/228645 mentions this issue: os, net: define and use os.ErrDeadlineExceeded

@networkimprov networkimprov changed the title net: document that Conn.Read/Write should return error that Is(context.DeadlineExceeded) for deadline exceeded net, os: Set*Deadline() expiration error should be unique, as .Timeout() is true for keepalive, etc Apr 18, 2020
@rsc
Copy link
Contributor

rsc commented Apr 22, 2020

Ian's CL seems worth trying. Let's just be ready to roll it back if there are problems.

xujianhai666 pushed a commit to xujianhai666/go-1 that referenced this issue May 21, 2020
If an I/O operation fails because a deadline was exceeded,
return os.ErrDeadlineExceeded. We used to return poll.ErrTimeout,
an internal error, and told users to check the Timeout method.
However, there are other errors with a Timeout method that returns true,
notably syscall.ETIMEDOUT which is returned for a keep-alive timeout.
Checking errors.Is(err, os.ErrDeadlineExceeded) should permit code
to reliably tell why it failed.

This change does not affect the handling of net.Dialer.Deadline,
nor does it change the handling of net.DialContext when the context
deadline is exceeded. Those cases continue to return an error
reported as "i/o timeout" for which Timeout is true, but that error
is not os.ErrDeadlineExceeded.

Fixes golang#31449

Change-Id: I0323f42e944324c6f2578f00c3ac90c24fe81177
Reviewed-on: https://go-review.googlesource.com/c/go/+/228645
Run-TryBot: Ian Lance Taylor <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Filippo Valsorda <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/239705 mentions this issue: net: consistently document deadline handling

gopherbot pushed a commit that referenced this issue Jun 24, 2020
After CL 228645 some mentions of the Deadline methods referred
to the Timeout method, and some to os.ErrDeadlineExceeded.
Stop referring to the Timeout method, to encourage ErrDeadlineExceeded.

For #31449

Change-Id: I27b8ff34f31798f38b06437546886af8cce98ca4
Reviewed-on: https://go-review.googlesource.com/c/go/+/239705
Run-TryBot: Ian Lance Taylor <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@golang golang locked and limited conversation to collaborators Jun 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker
Projects
None yet
Development

No branches or pull requests

8 participants