Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent server from exiting on ECONNABORTED #1874

Merged
merged 5 commits into from
Aug 20, 2024

Conversation

grembo
Copy link
Contributor

@grembo grembo commented Aug 18, 2024

Motivation

On FreeBSD, accept can fail with ECONNABORTED, which means that "A connection arrived, but it was closed while waiting on the listen queue." (see man 2 accept), which is in line with POSIX.

Without this patch, a server built without the tls feature exits returning 0 in case of an aborted connection, which makes it vulnerable not only to intentional denial of service, but also to unintentional crashes, e.g., by haproxy TCP health checks.

The problem can be reproduced on any FreeBSD system by running the tonic "helloworld" example (when building without feature tls) and then sending a packet using nmap:

cd examples
cargo run --bin helloworld-server --no-default-features &
nmap -sT -p 50051 -6 ::1
# server exits

When running the example with the feature tls enabled, it won't exit (as the tls event loop in tonic/src/transport/server/incoming.rs handles errors gracefully):

cd examples
cargo run --bin helloworld-server --no-default-features \
  features=tls &
nmap -sT -p 50051 -6 ::1
# server keeps running

Solution

The patch checks if the error returned in the accept loop of tcp_incoming is ECONNABORTED. If it is, it stays in the loop instead of exiting with an error (which subsequently exits the main event loop in serve_with_shutdown).

This patch is not optimal - it removes some generic error parameters to gain access to std::io::Error::kind(). The logic itself should be sound.

Edit: Removed a couple of things after helpful input in review.

See also:

On FreeBSD, accept can fail with ECONNABORTED, which means that
"A connection arrived, but it was on the listen queue"
(see `man 2 accept`).

Without this patch, a server without the tls feature  exits returing 0
in this case, which makes it vulnerable not only to intentional denial
of service, but also to unintentional crashes, e.g., by haproxy TCP
health checks.

The problem can be reproduced on any FreeBSD system by running the
tonic "helloworld" example (without feature TLS) and then sending a
packet using nmap:

    cd examples
    cargo run --bin helloworld-server --no-default-features &
    nmap -sT -p 50051 -6 ::1
    # server exits

When running the example with the feature tls enabled, it won't exit
(as the tls event loop in tonic/src/transport/server/incoming.rs
handles errors gracefully):

    cd examples
    cargo run --bin helloworld-server --no-default-features \
      features=tls &
    nmap -sT -p 50051 -6 ::1
    # server keeps running

This patch is not optimal - it removes some generic error parameters
to gain access to `std::io::Error::kind()`. The logic itself should be
sound.

See also:

- https://man.freebsd.org/cgi/man.cgi?accept(2)
  Accept man page
- giampaolo/pyftpdlib#105
  giampaolo/pyftpdlib@0f82232
  Basically the same issue (and its fix) in another project
Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to improve this, but unfortunately I don't think this change is a good idea in its current form.

tonic/src/transport/server/mod.rs Outdated Show resolved Hide resolved
tonic/src/transport/server/mod.rs Outdated Show resolved Hide resolved
This simplifies the previous patch a lot. The string search
is ugly (also not 100% if it's needed or if we could handle
errors like in the TLS enabled accept loop).
@grembo
Copy link
Contributor Author

grembo commented Aug 19, 2024

@djc I reduced the change to something basic. The string search isn't nice, but works given the constraints.

Also, the tracing example makes testing this nicer:

cargo run --bin tracing-server --no-default-features --features="tracing"

Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is compatible at least! I'd like to be more precise, though.

tonic/src/transport/server/incoming.rs Outdated Show resolved Hide resolved
@grembo
Copy link
Contributor Author

grembo commented Aug 19, 2024

Also mentioning hyperium/hyper#3597, as it seems related.

@grembo
Copy link
Contributor Author

grembo commented Aug 19, 2024

@djc Thank you for your help.

Personally I would be be fine with it to land at the current iteration (it seems tightly scoped and low-risk now).

That said, I'm wondering about a few things:

  • The "feature tls" accept loop seems not to exit at all on any error (so the two work differently), which means it could loop fast under low resource conditions
  • Other daemons I looked at handle errcode == ENFILE || errcode == EMFILE as transient problems, so they would delay and try again (but not exit)
  • It might also make sense to handle ECONNABORTED directly in https://github.com/tokio-rs/tokio/blob/5b9a290acd25d88416e70c73c49a54106fcf02b5/tokio/src/net/tcp/listener.rs#L171-L192 (and other poll_accept functions in tokio/net (I don't remember any code that actually made any use of ECONNABORTED beyond logging it, so it might be worth considering - but that would be outside of the scope of this change).

@@ -27,7 +27,19 @@ where
let mut incoming = pin!(incoming);

while let Some(item) = incoming.next().await {
yield item.map(ServerIo::new_io)?
if let Err(e) = item {
if let Some(e) = Into::<crate::Error>::into(e).downcast_ref::<std::io::Error>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is broadly acceptable but I have bunch of stylistic suggestions.

First, use a match for the top-level error extraction:

let err = match item {
    Ok(io) => yield Ok(ServerIo::new_io(io)), // Or however this is spelled
    Err(err) => err,
};

..

This reduces rightward drift.

Second, I would expect we can just call err.into() and don't need the whole Into::<crate::Error>::into(e) dance?

Third, for the debug messages, I think we should not use an explicit message = name but use the general form which is having the message after the "structured" argument (so tracing::debug!(error = %e, "accept loop error");). Note that error messages should be all-lowercase and not have any . at the end.

Fourth, please import io at the top of the file so we can drop the std:: prefixes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is broadly acceptable but I have bunch of stylistic suggestions.

First, use a match for the top-level error extraction:

let err = match item {
    Ok(io) => yield Ok(ServerIo::new_io(io)), // Or however this is spelled
    Err(err) => err,
};

..

This reduces rightward drift.

Second, I would expect we can just call err.into() and don't need the whole Into::<crate::Error>::into(e) dance?

Third, for the debug messages, I think we should not use an explicit message = name but use the general form which is having the message after the "structured" argument (so tracing::debug!(error = %e, "accept loop error");). Note that error messages should be all-lowercase and not have any . at the end.

Fourth, please import io at the top of the file so we can drop the std:: prefixes here.

Personally I don't have much of an opinion on these style issues, so I just changed things as requested.

There is one thing I really didn't like though, which was error extraction, which would have looked like this:

let err = match item {
    Ok(io) => { yield Ok(ServerIo::new_io(io)); continue },
    Err(err) => err,
};

do more stuff with err, then break;

So instead, I went with clearly defined Ok and Err branches (horizontal drift is IMHO acceptable). I'm quite happy with the readability of the outcome (especially having only one continue statement, pointing out the exception, and no break statements). This should also be relatively easy to parse to people who are not familiar with all language features + the diff to the master branch is very easy to understand.

@djc
Copy link
Contributor

djc commented Aug 19, 2024

@djc Thank you for your help.

Personally I would be be fine with it to land at the current iteration (it seems tightly scoped and low-risk now).

That said, I'm wondering about a few things:

  • The "feature tls" accept loop seems not to exit at all on any error (so the two work differently), which means it could loop fast under low resource conditions

Agreed that it would be good to keep tls and non-tls loops in sync on this front.

I think it makes sense to do more fine-tuned handling like this but would suggest keeping it out of scope for this PR.

}
Err(e)?
}
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok(_) => item.map(ServerIo::new_io)?

could also be written as

Ok(io) => Ok::<ServerIo<IO>, crate::Error>(ServerIo::new_io(io))?,

but in terms of readability and diff, I opted for the former (any better ideas?).

@djc djc added this pull request to the merge queue Aug 20, 2024
@djc
Copy link
Contributor

djc commented Aug 20, 2024

This is looking much better, let's get it merged and iterate from here. Thanks!

Merged via the queue into hyperium:master with commit b4c3518 Aug 20, 2024
16 checks passed
@grembo
Copy link
Contributor Author

grembo commented Aug 20, 2024

This is looking much better, let's get it merged and iterate from here. Thanks!

Thank you for helping to make this mergeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants