Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: io::Error::from_raw_os_error(ERROR_TIMEOUT) is not io::ErrorKind::TimedOut #71646

Closed
carstenandrich opened this issue Apr 28, 2020 · 7 comments · Fixed by #71756
Closed
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@carstenandrich
Copy link
Contributor

On Windows an io::Error created via from_raw_os_error() with the Windows system error code ERROR_TIMEOUT (1460) is not of the kind io::ErrorKind::TimedOut, but io::ErrorKind::Other.
This is counter-intuitive considering both (system error code and kind) have (almost) the same symbolic name.

I've tested this with Rust 1.42 on x86_64-pc-windows-gnu, but this behavior is obviously still present in git master. Unless I'm overlooking sth., this should be fairly easy to fix by defining the c::ERROR_TIMEOUT constant and adding the following after this line:

c::ERROR_TIMEOUT => return ErrorKind::TimedOut,

If you'd like me to, I can prepare a merge request implementing the suggested fix.

I tried this code:

use std::io;

// https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--500-999-
const ERROR_OPERATION_ABORTED: i32 = 995;
// https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1300-1699-
const ERROR_TIMEOUT: i32 = 1460;

fn main() {
	assert_eq!(io::Error::from_raw_os_error(ERROR_OPERATION_ABORTED).kind(),
			io::ErrorKind::TimedOut);
	assert_eq!(io::Error::from_raw_os_error(ERROR_TIMEOUT).kind(),
			io::ErrorKind::TimedOut);
}

I expected to see this happen: Assertion passes because (should be):

io::Error::from_raw_os_error(ERROR_TIMEOUT).kind() == io::ErrorKind::TimedOut`

Instead, this happened: Assertion fails because:

io::Error::from_raw_os_error(ERROR_TIMEOUT).kind() == io::ErrorKind::Other`

Meta

rustc --version --verbose:

rustc 1.42.0 (b8cedc004 2020-03-09)
binary: rustc
commit-hash: b8cedc00407a4c56a3bda1ed605c6fc166655447
commit-date: 2020-03-09
host: x86_64-pc-windows-gnu
release: 1.42.0
LLVM version: 9.0
Backtrace

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Other`,
 right: `TimedOut`', src\main.rs:9:5
stack backtrace:
   0: backtrace::backtrace::dbghelp::trace
             at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace/dbghelp.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src\libstd\sys_common/backtrace.rs:77
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src\libstd\sys_common/backtrace.rs:59
   4: core::fmt::write
             at src\libcore\fmt/mod.rs:1052
   5: std::io::Write::write_fmt
             at src\libstd\io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src\libstd\sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src\libstd\sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src\libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src\libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src\libstd/panicking.rs:472
  11: rust_begin_unwind
             at src\libstd/panicking.rs:380
  12: std::panicking::begin_panic_fmt
             at src\libstd/panicking.rs:334
  13: rust_test::main
             at src/main.rs:9
  14: std::rt::lang_start::{{closure}}
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\src\libstd/rt.rs:67
  15: std::rt::lang_start_internal::{{closure}}
             at src\libstd/rt.rs:52
  16: std::panicking::try::do_call
             at src\libstd/panicking.rs:305
  17: __rust_maybe_catch_panic
             at src\libpanic_unwind/lib.rs:86
  18: std::panicking::try
             at src\libstd/panicking.rs:281
  19: std::panic::catch_unwind
             at src\libstd/panic.rs:394
  20: std::rt::lang_start_internal
             at src\libstd/rt.rs:51
  21: std::rt::lang_start
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\src\libstd/rt.rs:67
  22: main
  23: _tmainCRTStartup
  24: mainCRTStartup
  25: unit_addrs_search
  26: unit_addrs_search
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: process didn't exit successfully: `target\debug\rust-test.exe` (exit code: 101)

@carstenandrich carstenandrich added the C-bug Category: This is a bug. label Apr 28, 2020
@jonas-schievink jonas-schievink added O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 28, 2020
@carstenandrich
Copy link
Contributor Author

carstenandrich commented Apr 29, 2020

This issue is not limited to the system error code ERROR_TIMEOUT, as Windows appears to use various error codes for what would all be ETIMEDOUT on POSIX. For example, I'm getting an ERROR_SEM_TIMEOUT (121) if a write operation on a COM port times out (on Windows 10 version 1809).

Looking at the list of Windows system error codes, I found the following that contain TIME and appear to be timeouts:

  • ERROR_SEM_TIMEOUT (121)
  • WAIT_TIMEOUT (258)
  • ERROR_DRIVER_CANCEL_TIMEOUT (594)
  • ERROR_SERVICE_REQUEST_TIMEOUT (1053)
  • ERROR_COUNTER_TIMEOUT (1121)
  • ERROR_TIMEOUT (1460)
  • RPC_S_INVALID_TIMEOUT (1709) Thanks to @retep998 for pointing out that this error code indicates an invalid timeout and not that a timeout occurred.
  • ERROR_RESOURCE_CALL_TIMED_OUT (5910)
  • ERROR_CTX_MODEM_RESPONSE_TIMEOUT (7012)
  • ERROR_CTX_CLIENT_QUERY_TIMEOUT (7040)
  • FRS_ERR_SYSVOL_POPULATE_TIMEOUT (8014)
  • ERROR_DS_TIMELIMIT_EXCEEDED (8226)
  • DNS_ERROR_RECORD_TIMED_OUT (9705)
  • WSAETIMEDOUT (10060)
  • ERROR_IPSEC_IKE_TIMED_OUT (13805)
  • ERROR_RUNLEVEL_SWITCH_TIMEOUT (15402)
  • ERROR_RUNLEVEL_SWITCH_AGENT_TIMEOUT (15403)

@Mark-Simulacrum
Copy link
Member

One concern with merging these all into ErrorKind::TimedOut is that the caller would no longer know which of the timeout occurred -- are there cases where that's important?

@carstenandrich
Copy link
Contributor Author

I don't think that's the case, but please correct me if I'm wrong. The following example

use std::io;

const ERROR_OPERATION_ABORTED: i32 = 995;
const ERROR_TIMEOUT: i32 = 1460;

fn main() {
        println!("{:?}", io::Error::from_raw_os_error(ERROR_OPERATION_ABORTED));
        println!("{:?}", io::Error::from_raw_os_error(ERROR_TIMEOUT));
}

will produce this on Windows:

Os { code: 995, kind: TimedOut, message: "The I/O operation has been aborted because of either a thread exit or an application request." }
Os { code: 1460, kind: Other, message: "This operation returned because the timeout period expired." }

Currently, only ERROR_OPERATION_ABORTED will yield an io::Error of io::ErrorKind::TimedOut. All other timeout errors will be of io::ErrorKind::Other, which is counter-intuitive, because matching an error's kind against io::ErrorKind::TimedOut will not catch all timeout errors.

I would suggest to only change the kind of the errors listed above to io::ErrorKind::TimedOut. The original OS error code can still be accessed via raw_os_error() if the programmer is interested in the OS-specific timeout reason. Therefore, this should not break backwards-compatibility, unless someone expects these errors to be io::ErrorKind::Other, which should be discouraged.

@Mark-Simulacrum
Copy link
Member

Ah, okay, right. I would in that case not be personally opposed to making this change, I think a PR that does so would be the best way to do so (and that can be FCP'd to libs team).

@carstenandrich
Copy link
Contributor Author

All right. Thanks for the quick response. I'll prepare a pull request.

@retep998
Copy link
Member

Note that RPC_S_INVALID_TIMEOUT does not indicate the operation timed out, but rather that the timeout you specified was invalid.

@carstenandrich
Copy link
Contributor Author

@retep998 Thanks for checking that. I'll revise the list of errors before submitting the PR, but I'm afraid I have very close to zero experience with the Windows API. I just stumbled over this issue while porting POSIX software to Windows.

Manishearth added a commit to Manishearth/rust that referenced this issue Jun 21, 2020
add Windows system error codes that should map to io::ErrorKind::TimedOut

closes rust-lang#71646

**Disclaimer:** The author of this pull request has a negligible amount of experience (i.e., kinda zero) with the Windows API. This PR should _definitely_ be reviewed by someone familiar with the API and its error handling.

While porting POSIX software using serial ports to Windows, I found that for many Windows system error codes, an `io::Error` created via `io::Error::from_raw_os_error()` or `io::Error::last_os_error()` is not `io::ErrorKind::TimedOut`. For example, when a (non-overlapped) write to a COM port via [`WriteFile()`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile) times out, [`GetLastError()`](https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-getlasterror) returns `ERROR_SEM_TIMEOUT` ([121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-)). However, an `io::Error` created from this error code will have `io::ErrorKind::Other`.

Currently, only the error codes `ERROR_OPERATION_ABORTED` and `WSAETIMEDOUT` will instantiate `io::Error`s with kind `io::ErrorKind::TimedOut`.
This makes `io::Error::last_os_error()` unsuitable for error handling of syscalls that could time out, because timeouts can not be caught by matching the error's kind against `io::ErrorKind::TimedOut`.

Downloading the [list of Windows system error codes](https://gist.github.com/carstenandrich/c331d557520b8a0e7f44689ca257f805) and grepping anything that sounds like a timeout (`egrep -i "timed?.?(out|limit)"`), I've identified the following error codes that should also have `io::ErrorKind::TimedOut`, because they could be I/O-related:

Name | Code | Description
--- | --- | ---
`ERROR_SEM_TIMEOUT` | [121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The semaphore timeout period has expired.
`WAIT_TIMEOUT` | [258](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The wait operation timed out.
`ERROR_DRIVER_CANCEL_TIMEOUT` | [594](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--500-999-) | The driver %hs failed to complete a cancelled I/O request in the allotted time.
`ERROR_COUNTER_TIMEOUT` | [1121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | A serial I/O operation completed because the timeout period expired. The IOCTL_SERIAL_XOFF_COUNTER did not reach zero.)
`ERROR_TIMEOUT` | [1460](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1300-1699-) | This operation returned because the timeout period expired.
`ERROR_CTX_MODEM_RESPONSE_TIMEOUT` | [7012](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The modem did not respond to the command sent to it. Verify that the modem is properly cabled and powered on.
`ERROR_CTX_CLIENT_QUERY_TIMEOUT` | [7040](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The client failed to respond to the server connect message.
`ERROR_DS_TIMELIMIT_EXCEEDED` | [8226](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--8200-8999-) | The time limit for this request was exceeded.
`DNS_ERROR_RECORD_TIMED_OUT` | [9705](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--9000-11999-) | DNS record timed out.
`ERROR_IPSEC_IKE_TIMED_OUT` | [13805](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | Negotiation timed out.

The following errors are also timeouts, but they don't seem to be directly related to I/O or network operations:

Name | Code | Description
--- | --- | ---
`ERROR_SERVICE_REQUEST_TIMEOUT` | [1053](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | The service did not respond to the start or control request in a timely fashion.
`ERROR_RESOURCE_CALL_TIMED_OUT` | [5910](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--4000-5999-) | The call to the cluster resource DLL timed out.
`FRS_ERR_SYSVOL_POPULATE_TIMEOUT` | [8014](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The file replication service cannot populate the system volume because of an internal timeout. The event log may have more information.
`ERROR_RUNLEVEL_SWITCH_TIMEOUT` | [15402](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | The requested run level switch cannot be completed successfully since one or more services will not stop or restart within the specified timeout.
`ERROR_RUNLEVEL_SWITCH_AGENT_TIMEOUT` | [15403](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | A run level switch agent did not respond within the specified timeout.

Please note that `ERROR_SEM_TIMEOUT` is the only timeout error I have [seen in action](https://gist.github.com/carstenandrich/10b3962fa1abc9e50816b6460010900b). The remainder of the error codes listed above is based purely on reading documentation.

This pull request adds all of the errors listed in both tables, but I'm not sure whether adding all of them makes sense. Someone with actual Windows API experience should decide that.

I expect these changes to be fairly backwards compatible, because only the error's [`.kind()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.kind) will change, but matching the error's code via [`.raw_os_error()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.raw_os_error) will not be affected.
However, code expecting these errors to be `io::ErrorKind::Other` would break. Even though I personally do not think such an implementation would make sense, after all the docs say that `io::ErrorKind` is _intended to grow over time_, a residual risk remains, of course. I took the liberty to ammend the docstring of `io::ErrorKind::Other` with a remark that discourages matching against it.

As per the contributing guidelines I'm adding @steveklabnik due to the changed documentation. Also @retep998 might have some valuable insights on the error codes.

r? @steveklabnik
cc @retep998
cc @Mark-Simulacrum
Manishearth added a commit to Manishearth/rust that referenced this issue Jun 22, 2020
add Windows system error codes that should map to io::ErrorKind::TimedOut

closes rust-lang#71646

**Disclaimer:** The author of this pull request has a negligible amount of experience (i.e., kinda zero) with the Windows API. This PR should _definitely_ be reviewed by someone familiar with the API and its error handling.

While porting POSIX software using serial ports to Windows, I found that for many Windows system error codes, an `io::Error` created via `io::Error::from_raw_os_error()` or `io::Error::last_os_error()` is not `io::ErrorKind::TimedOut`. For example, when a (non-overlapped) write to a COM port via [`WriteFile()`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile) times out, [`GetLastError()`](https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-getlasterror) returns `ERROR_SEM_TIMEOUT` ([121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-)). However, an `io::Error` created from this error code will have `io::ErrorKind::Other`.

Currently, only the error codes `ERROR_OPERATION_ABORTED` and `WSAETIMEDOUT` will instantiate `io::Error`s with kind `io::ErrorKind::TimedOut`.
This makes `io::Error::last_os_error()` unsuitable for error handling of syscalls that could time out, because timeouts can not be caught by matching the error's kind against `io::ErrorKind::TimedOut`.

Downloading the [list of Windows system error codes](https://gist.github.com/carstenandrich/c331d557520b8a0e7f44689ca257f805) and grepping anything that sounds like a timeout (`egrep -i "timed?.?(out|limit)"`), I've identified the following error codes that should also have `io::ErrorKind::TimedOut`, because they could be I/O-related:

Name | Code | Description
--- | --- | ---
`ERROR_SEM_TIMEOUT` | [121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The semaphore timeout period has expired.
`WAIT_TIMEOUT` | [258](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The wait operation timed out.
`ERROR_DRIVER_CANCEL_TIMEOUT` | [594](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--500-999-) | The driver %hs failed to complete a cancelled I/O request in the allotted time.
`ERROR_COUNTER_TIMEOUT` | [1121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | A serial I/O operation completed because the timeout period expired. The IOCTL_SERIAL_XOFF_COUNTER did not reach zero.)
`ERROR_TIMEOUT` | [1460](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1300-1699-) | This operation returned because the timeout period expired.
`ERROR_CTX_MODEM_RESPONSE_TIMEOUT` | [7012](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The modem did not respond to the command sent to it. Verify that the modem is properly cabled and powered on.
`ERROR_CTX_CLIENT_QUERY_TIMEOUT` | [7040](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The client failed to respond to the server connect message.
`ERROR_DS_TIMELIMIT_EXCEEDED` | [8226](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--8200-8999-) | The time limit for this request was exceeded.
`DNS_ERROR_RECORD_TIMED_OUT` | [9705](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--9000-11999-) | DNS record timed out.
`ERROR_IPSEC_IKE_TIMED_OUT` | [13805](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | Negotiation timed out.

The following errors are also timeouts, but they don't seem to be directly related to I/O or network operations:

Name | Code | Description
--- | --- | ---
`ERROR_SERVICE_REQUEST_TIMEOUT` | [1053](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | The service did not respond to the start or control request in a timely fashion.
`ERROR_RESOURCE_CALL_TIMED_OUT` | [5910](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--4000-5999-) | The call to the cluster resource DLL timed out.
`FRS_ERR_SYSVOL_POPULATE_TIMEOUT` | [8014](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The file replication service cannot populate the system volume because of an internal timeout. The event log may have more information.
`ERROR_RUNLEVEL_SWITCH_TIMEOUT` | [15402](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | The requested run level switch cannot be completed successfully since one or more services will not stop or restart within the specified timeout.
`ERROR_RUNLEVEL_SWITCH_AGENT_TIMEOUT` | [15403](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | A run level switch agent did not respond within the specified timeout.

Please note that `ERROR_SEM_TIMEOUT` is the only timeout error I have [seen in action](https://gist.github.com/carstenandrich/10b3962fa1abc9e50816b6460010900b). The remainder of the error codes listed above is based purely on reading documentation.

This pull request adds all of the errors listed in both tables, but I'm not sure whether adding all of them makes sense. Someone with actual Windows API experience should decide that.

I expect these changes to be fairly backwards compatible, because only the error's [`.kind()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.kind) will change, but matching the error's code via [`.raw_os_error()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.raw_os_error) will not be affected.
However, code expecting these errors to be `io::ErrorKind::Other` would break. Even though I personally do not think such an implementation would make sense, after all the docs say that `io::ErrorKind` is _intended to grow over time_, a residual risk remains, of course. I took the liberty to ammend the docstring of `io::ErrorKind::Other` with a remark that discourages matching against it.

As per the contributing guidelines I'm adding @steveklabnik due to the changed documentation. Also @retep998 might have some valuable insights on the error codes.

r? @steveklabnik
cc @retep998
cc @Mark-Simulacrum
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Jun 22, 2020
add Windows system error codes that should map to io::ErrorKind::TimedOut

closes rust-lang#71646

**Disclaimer:** The author of this pull request has a negligible amount of experience (i.e., kinda zero) with the Windows API. This PR should _definitely_ be reviewed by someone familiar with the API and its error handling.

While porting POSIX software using serial ports to Windows, I found that for many Windows system error codes, an `io::Error` created via `io::Error::from_raw_os_error()` or `io::Error::last_os_error()` is not `io::ErrorKind::TimedOut`. For example, when a (non-overlapped) write to a COM port via [`WriteFile()`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile) times out, [`GetLastError()`](https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-getlasterror) returns `ERROR_SEM_TIMEOUT` ([121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-)). However, an `io::Error` created from this error code will have `io::ErrorKind::Other`.

Currently, only the error codes `ERROR_OPERATION_ABORTED` and `WSAETIMEDOUT` will instantiate `io::Error`s with kind `io::ErrorKind::TimedOut`.
This makes `io::Error::last_os_error()` unsuitable for error handling of syscalls that could time out, because timeouts can not be caught by matching the error's kind against `io::ErrorKind::TimedOut`.

Downloading the [list of Windows system error codes](https://gist.github.com/carstenandrich/c331d557520b8a0e7f44689ca257f805) and grepping anything that sounds like a timeout (`egrep -i "timed?.?(out|limit)"`), I've identified the following error codes that should also have `io::ErrorKind::TimedOut`, because they could be I/O-related:

Name | Code | Description
--- | --- | ---
`ERROR_SEM_TIMEOUT` | [121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The semaphore timeout period has expired.
`WAIT_TIMEOUT` | [258](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The wait operation timed out.
`ERROR_DRIVER_CANCEL_TIMEOUT` | [594](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--500-999-) | The driver %hs failed to complete a cancelled I/O request in the allotted time.
`ERROR_COUNTER_TIMEOUT` | [1121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | A serial I/O operation completed because the timeout period expired. The IOCTL_SERIAL_XOFF_COUNTER did not reach zero.)
`ERROR_TIMEOUT` | [1460](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1300-1699-) | This operation returned because the timeout period expired.
`ERROR_CTX_MODEM_RESPONSE_TIMEOUT` | [7012](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The modem did not respond to the command sent to it. Verify that the modem is properly cabled and powered on.
`ERROR_CTX_CLIENT_QUERY_TIMEOUT` | [7040](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The client failed to respond to the server connect message.
`ERROR_DS_TIMELIMIT_EXCEEDED` | [8226](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--8200-8999-) | The time limit for this request was exceeded.
`DNS_ERROR_RECORD_TIMED_OUT` | [9705](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--9000-11999-) | DNS record timed out.
`ERROR_IPSEC_IKE_TIMED_OUT` | [13805](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | Negotiation timed out.

The following errors are also timeouts, but they don't seem to be directly related to I/O or network operations:

Name | Code | Description
--- | --- | ---
`ERROR_SERVICE_REQUEST_TIMEOUT` | [1053](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | The service did not respond to the start or control request in a timely fashion.
`ERROR_RESOURCE_CALL_TIMED_OUT` | [5910](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--4000-5999-) | The call to the cluster resource DLL timed out.
`FRS_ERR_SYSVOL_POPULATE_TIMEOUT` | [8014](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The file replication service cannot populate the system volume because of an internal timeout. The event log may have more information.
`ERROR_RUNLEVEL_SWITCH_TIMEOUT` | [15402](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | The requested run level switch cannot be completed successfully since one or more services will not stop or restart within the specified timeout.
`ERROR_RUNLEVEL_SWITCH_AGENT_TIMEOUT` | [15403](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | A run level switch agent did not respond within the specified timeout.

Please note that `ERROR_SEM_TIMEOUT` is the only timeout error I have [seen in action](https://gist.github.com/carstenandrich/10b3962fa1abc9e50816b6460010900b). The remainder of the error codes listed above is based purely on reading documentation.

This pull request adds all of the errors listed in both tables, but I'm not sure whether adding all of them makes sense. Someone with actual Windows API experience should decide that.

I expect these changes to be fairly backwards compatible, because only the error's [`.kind()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.kind) will change, but matching the error's code via [`.raw_os_error()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.raw_os_error) will not be affected.
However, code expecting these errors to be `io::ErrorKind::Other` would break. Even though I personally do not think such an implementation would make sense, after all the docs say that `io::ErrorKind` is _intended to grow over time_, a residual risk remains, of course. I took the liberty to ammend the docstring of `io::ErrorKind::Other` with a remark that discourages matching against it.

As per the contributing guidelines I'm adding @steveklabnik due to the changed documentation. Also @retep998 might have some valuable insights on the error codes.

r? @steveklabnik
cc @retep998
cc @Mark-Simulacrum
Manishearth added a commit to Manishearth/rust that referenced this issue Jun 22, 2020
add Windows system error codes that should map to io::ErrorKind::TimedOut

closes rust-lang#71646

**Disclaimer:** The author of this pull request has a negligible amount of experience (i.e., kinda zero) with the Windows API. This PR should _definitely_ be reviewed by someone familiar with the API and its error handling.

While porting POSIX software using serial ports to Windows, I found that for many Windows system error codes, an `io::Error` created via `io::Error::from_raw_os_error()` or `io::Error::last_os_error()` is not `io::ErrorKind::TimedOut`. For example, when a (non-overlapped) write to a COM port via [`WriteFile()`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfile) times out, [`GetLastError()`](https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-getlasterror) returns `ERROR_SEM_TIMEOUT` ([121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-)). However, an `io::Error` created from this error code will have `io::ErrorKind::Other`.

Currently, only the error codes `ERROR_OPERATION_ABORTED` and `WSAETIMEDOUT` will instantiate `io::Error`s with kind `io::ErrorKind::TimedOut`.
This makes `io::Error::last_os_error()` unsuitable for error handling of syscalls that could time out, because timeouts can not be caught by matching the error's kind against `io::ErrorKind::TimedOut`.

Downloading the [list of Windows system error codes](https://gist.github.com/carstenandrich/c331d557520b8a0e7f44689ca257f805) and grepping anything that sounds like a timeout (`egrep -i "timed?.?(out|limit)"`), I've identified the following error codes that should also have `io::ErrorKind::TimedOut`, because they could be I/O-related:

Name | Code | Description
--- | --- | ---
`ERROR_SEM_TIMEOUT` | [121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The semaphore timeout period has expired.
`WAIT_TIMEOUT` | [258](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) | The wait operation timed out.
`ERROR_DRIVER_CANCEL_TIMEOUT` | [594](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--500-999-) | The driver %hs failed to complete a cancelled I/O request in the allotted time.
`ERROR_COUNTER_TIMEOUT` | [1121](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | A serial I/O operation completed because the timeout period expired. The IOCTL_SERIAL_XOFF_COUNTER did not reach zero.)
`ERROR_TIMEOUT` | [1460](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1300-1699-) | This operation returned because the timeout period expired.
`ERROR_CTX_MODEM_RESPONSE_TIMEOUT` | [7012](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The modem did not respond to the command sent to it. Verify that the modem is properly cabled and powered on.
`ERROR_CTX_CLIENT_QUERY_TIMEOUT` | [7040](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The client failed to respond to the server connect message.
`ERROR_DS_TIMELIMIT_EXCEEDED` | [8226](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--8200-8999-) | The time limit for this request was exceeded.
`DNS_ERROR_RECORD_TIMED_OUT` | [9705](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--9000-11999-) | DNS record timed out.
`ERROR_IPSEC_IKE_TIMED_OUT` | [13805](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | Negotiation timed out.

The following errors are also timeouts, but they don't seem to be directly related to I/O or network operations:

Name | Code | Description
--- | --- | ---
`ERROR_SERVICE_REQUEST_TIMEOUT` | [1053](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--1000-1299-) | The service did not respond to the start or control request in a timely fashion.
`ERROR_RESOURCE_CALL_TIMED_OUT` | [5910](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--4000-5999-) | The call to the cluster resource DLL timed out.
`FRS_ERR_SYSVOL_POPULATE_TIMEOUT` | [8014](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--6000-8199-) | The file replication service cannot populate the system volume because of an internal timeout. The event log may have more information.
`ERROR_RUNLEVEL_SWITCH_TIMEOUT` | [15402](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | The requested run level switch cannot be completed successfully since one or more services will not stop or restart within the specified timeout.
`ERROR_RUNLEVEL_SWITCH_AGENT_TIMEOUT` | [15403](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--12000-15999-) | A run level switch agent did not respond within the specified timeout.

Please note that `ERROR_SEM_TIMEOUT` is the only timeout error I have [seen in action](https://gist.github.com/carstenandrich/10b3962fa1abc9e50816b6460010900b). The remainder of the error codes listed above is based purely on reading documentation.

This pull request adds all of the errors listed in both tables, but I'm not sure whether adding all of them makes sense. Someone with actual Windows API experience should decide that.

I expect these changes to be fairly backwards compatible, because only the error's [`.kind()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.kind) will change, but matching the error's code via [`.raw_os_error()`](https://doc.rust-lang.org/std/io/struct.Error.html#method.raw_os_error) will not be affected.
However, code expecting these errors to be `io::ErrorKind::Other` would break. Even though I personally do not think such an implementation would make sense, after all the docs say that `io::ErrorKind` is _intended to grow over time_, a residual risk remains, of course. I took the liberty to ammend the docstring of `io::ErrorKind::Other` with a remark that discourages matching against it.

As per the contributing guidelines I'm adding @steveklabnik due to the changed documentation. Also @retep998 might have some valuable insights on the error codes.

r? @steveklabnik
cc @retep998
cc @Mark-Simulacrum
@bors bors closed this as completed in 6276c13 Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants