Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some EOF-handling issues in TLS #44563

Closed
wants to merge 2 commits into from

Conversation

davidben
Copy link
Contributor

@davidben davidben commented Sep 7, 2022

This fixes a pair of issues with EOF handling in TLSWrap. This originally came up because @codebytere noticed an incompatibility with Node and a recent BoringSSL change, but the root cause is a bug in Node that also impacts how it uses OpenSSL. I've split it into two commits in case you all want to backport only one of them, as the second changes how some errors are surfaced. Node was previously incorrectly representing a class of TLS errors are ECONNRESET, dropping the true error on the ground.

The changes are as follows:

tls: fix re-entrancy issue with TLS close_notify

Like errno, OpenSSL's API requires SSL_get_error and error queue be checked immediately after the failing operation, otherwise the error queue or SSL object may have changed state and no longer report information about the operation the caller wanted.

TLSWrap almost heeds this rule, except in TLSWrap::ClearOut. If SSL_read picks up a closing alert (detected by checking SSL_get_shutdown), Node calls out to JS with EmitRead(UV_EOF) and only afterwards proceeds to dispatch on the error. But, by this point, Node has already re-entered JS, which may change the error.

In particular, I've observed that, on close_notify, JS seems to sometimes call back into TLSWrap::DoShutdown, calling SSL_shutdown. (I think this comes from onStreamRead in stream_base_commons.js?)

Instead, SSL_get_error and the error queue should be sampled earlier. Back in #1661, Node needed to account for GetSSLError being called after ssl_ was destroyed. This was the real cause. With this fixed, there's no need to account for this. (Any case where ssl_ may be destroyed before SSL_get_error is a case where ssl_ or the error queue could change state, so it's a bug either way.)

I've done this by just moving the calls up a bit, to preserve the existing C++ => JS behavior, but that revealed a bigger issue about EmitRead(UV_EOF), which is the second commit.

tls: don't treat fatal TLS alerts as EOF

SSL_RECEIVED_SHUTDOWN means not just close_notify but also fatal alert. From what I can tell, treating fatal alert as EOF was just a mistake? OnStreamRead's comment suggests eof_ was intended to be for close_notify.

This fixes a bug in TLSSocket error reporting that seems to have made it into existing tests. If we receive a fatal alert, EmitRead(UV_EOF) would, via onConnectEnd in _tls_wrap.js, synthesize an ECONNRESET before the alert itself is surfaced. As a result, TLS alerts received during the handshake are misreported by Node.

See the tests that had to be updated as part of this.


I should note, while I'm familiar with TLS and OpenSSL, I'm not at all familiar with Node's streams machinery or sockets APIs. Do you all actually expect to emit EOF on error? This change removes an EmitRead(UV_EOF) on fatal TLS alert. Is that consistent with JS state? I'm assuming so, because there are plenty of other fatal errors which TLSWrap did not emit EOF on (anything case where we reject something from the peer, rather than seeing an alert message, will not set SSL_RECEIVED_SHUTDOWN). So I assume this change just makes peer fatal alerts go through the same codepaths as other peer errors. But hopefully you all will be able to review this better.

I suspect there's also a way to construct a test which demonstrates the first fix standalone. We ran into it in BoringSSL because, to fix some other bugs in OpenSSL's error-handling, we had to treat SSL_ERROR_ZERO_RETURN more like the other SSL_ERROR_* constants. OpenSSL is still quite lax about it, which is masking this bug in the particular scenario where we hit it. But there's probably a more interesting case where reentrancy causes Node to lose a TLS alert. Except you all seem to already be losing TLS alerts until the second commit anyway, so I'm not sure. :-)

Like errno, OpenSSL's API requires SSL_get_error and error queue be
checked immediately after the failing operation, otherwise the error
queue or SSL object may have changed state and no longer report
information about the operation the caller wanted.

TLSWrap almost heeds this rule, except in TLSWrap::ClearOut. If SSL_read
picks up a closing alert (detected by checking SSL_get_shutdown), Node
calls out to JS with EmitRead(UV_EOF) and only afterwards proceeds to
dispatch on the error. But, by this point, Node has already re-entered
JS, which may change the error.

In particular, I've observed that, on close_notify, JS seems to
sometimes call back into TLSWrap::DoShutdown, calling SSL_shutdown. (I
think this comes from onStreamRead in stream_base_commons.js?)

Instead, SSL_get_error and the error queue should be sampled earlier.
Back in nodejs#1661, Node needed to account for GetSSLError being called after
ssl_ was destroyed. This was the real cause. With this fixed, there's no
need to account for this. (Any case where ssl_ may be destroyed before
SSL_get_error is a case where ssl_ or the error queue could change
state, so it's a bug either way.)

This is the first of two fixes in error-handling here. The
EmitRead(UV_EOF) seems to additionally swallow fatal alerts from the
peer. Some of the ECONNRESET expectations in the tests aren't actually
correct. The next commit will fix this as well.
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/crypto

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Sep 7, 2022
@lpinca lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 8, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 8, 2022
@nodejs-github-bot
Copy link
Collaborator

SSL_RECEIVED_SHUTDOWN means not just close_notify or fatal alert. From
what I can tell, this was just a mistake? OnStreamRead's comment
suggests eof_ was intended to be for close_notify.

This fixes a bug in TLSSocket error reporting that seems to have made it
into existing tests. If we receive a fatal alert, EmitRead(UV_EOF)
would, via onConnectEnd in _tls_wrap.js, synthesize an ECONNRESET before
the alert itself is surfaced. As a result, TLS alerts received during
the handshake are misreported by Node.

See the tests that had to be updated as part of this.
@davidben
Copy link
Contributor Author

davidben commented Sep 8, 2022

(Updated the second commit to hopefully fix the lint errors.)

@lpinca lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 17, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 17, 2022
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@lpinca lpinca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM

@lpinca
Copy link
Member

lpinca commented Nov 3, 2022

This should be reviewed by someone who is more familiar with this code and TLS. @bnoordhuis @jasnell?

@davidben
Copy link
Contributor Author

davidben commented Dec 5, 2022

Friendly ping. Is this waiting on anything from my end?

@lpinca
Copy link
Member

lpinca commented Dec 6, 2022

Is this waiting on anything from my end?

No. I'll try to ping @nodejs/crypto again.

@lpinca lpinca added commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. commit-queue Add this label to land a pull request using GitHub Actions. labels Dec 12, 2022
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 12, 2022
@nodejs-github-bot
Copy link
Collaborator

Landed in ef8aa88...8b89d4d

nodejs-github-bot pushed a commit that referenced this pull request Dec 12, 2022
Like errno, OpenSSL's API requires SSL_get_error and error queue be
checked immediately after the failing operation, otherwise the error
queue or SSL object may have changed state and no longer report
information about the operation the caller wanted.

TLSWrap almost heeds this rule, except in TLSWrap::ClearOut. If SSL_read
picks up a closing alert (detected by checking SSL_get_shutdown), Node
calls out to JS with EmitRead(UV_EOF) and only afterwards proceeds to
dispatch on the error. But, by this point, Node has already re-entered
JS, which may change the error.

In particular, I've observed that, on close_notify, JS seems to
sometimes call back into TLSWrap::DoShutdown, calling SSL_shutdown. (I
think this comes from onStreamRead in stream_base_commons.js?)

Instead, SSL_get_error and the error queue should be sampled earlier.
Back in #1661, Node needed to account for GetSSLError being called after
ssl_ was destroyed. This was the real cause. With this fixed, there's no
need to account for this. (Any case where ssl_ may be destroyed before
SSL_get_error is a case where ssl_ or the error queue could change
state, so it's a bug either way.)

This is the first of two fixes in error-handling here. The
EmitRead(UV_EOF) seems to additionally swallow fatal alerts from the
peer. Some of the ECONNRESET expectations in the tests aren't actually
correct. The next commit will fix this as well.

PR-URL: #44563
Reviewed-By: Luigi Pinca <[email protected]>
nodejs-github-bot pushed a commit that referenced this pull request Dec 12, 2022
SSL_RECEIVED_SHUTDOWN means not just close_notify or fatal alert. From
what I can tell, this was just a mistake? OnStreamRead's comment
suggests eof_ was intended to be for close_notify.

This fixes a bug in TLSSocket error reporting that seems to have made it
into existing tests. If we receive a fatal alert, EmitRead(UV_EOF)
would, via onConnectEnd in _tls_wrap.js, synthesize an ECONNRESET before
the alert itself is surfaced. As a result, TLS alerts received during
the handshake are misreported by Node.

See the tests that had to be updated as part of this.

PR-URL: #44563
Reviewed-By: Luigi Pinca <[email protected]>
targos pushed a commit that referenced this pull request Jan 1, 2023
Like errno, OpenSSL's API requires SSL_get_error and error queue be
checked immediately after the failing operation, otherwise the error
queue or SSL object may have changed state and no longer report
information about the operation the caller wanted.

TLSWrap almost heeds this rule, except in TLSWrap::ClearOut. If SSL_read
picks up a closing alert (detected by checking SSL_get_shutdown), Node
calls out to JS with EmitRead(UV_EOF) and only afterwards proceeds to
dispatch on the error. But, by this point, Node has already re-entered
JS, which may change the error.

In particular, I've observed that, on close_notify, JS seems to
sometimes call back into TLSWrap::DoShutdown, calling SSL_shutdown. (I
think this comes from onStreamRead in stream_base_commons.js?)

Instead, SSL_get_error and the error queue should be sampled earlier.
Back in #1661, Node needed to account for GetSSLError being called after
ssl_ was destroyed. This was the real cause. With this fixed, there's no
need to account for this. (Any case where ssl_ may be destroyed before
SSL_get_error is a case where ssl_ or the error queue could change
state, so it's a bug either way.)

This is the first of two fixes in error-handling here. The
EmitRead(UV_EOF) seems to additionally swallow fatal alerts from the
peer. Some of the ECONNRESET expectations in the tests aren't actually
correct. The next commit will fix this as well.

PR-URL: #44563
Reviewed-By: Luigi Pinca <[email protected]>
targos pushed a commit that referenced this pull request Jan 1, 2023
SSL_RECEIVED_SHUTDOWN means not just close_notify or fatal alert. From
what I can tell, this was just a mistake? OnStreamRead's comment
suggests eof_ was intended to be for close_notify.

This fixes a bug in TLSSocket error reporting that seems to have made it
into existing tests. If we receive a fatal alert, EmitRead(UV_EOF)
would, via onConnectEnd in _tls_wrap.js, synthesize an ECONNRESET before
the alert itself is surfaced. As a result, TLS alerts received during
the handshake are misreported by Node.

See the tests that had to be updated as part of this.

PR-URL: #44563
Reviewed-By: Luigi Pinca <[email protected]>
@RafaelGSS RafaelGSS mentioned this pull request Jan 2, 2023
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
Like errno, OpenSSL's API requires SSL_get_error and error queue be
checked immediately after the failing operation, otherwise the error
queue or SSL object may have changed state and no longer report
information about the operation the caller wanted.

TLSWrap almost heeds this rule, except in TLSWrap::ClearOut. If SSL_read
picks up a closing alert (detected by checking SSL_get_shutdown), Node
calls out to JS with EmitRead(UV_EOF) and only afterwards proceeds to
dispatch on the error. But, by this point, Node has already re-entered
JS, which may change the error.

In particular, I've observed that, on close_notify, JS seems to
sometimes call back into TLSWrap::DoShutdown, calling SSL_shutdown. (I
think this comes from onStreamRead in stream_base_commons.js?)

Instead, SSL_get_error and the error queue should be sampled earlier.
Back in #1661, Node needed to account for GetSSLError being called after
ssl_ was destroyed. This was the real cause. With this fixed, there's no
need to account for this. (Any case where ssl_ may be destroyed before
SSL_get_error is a case where ssl_ or the error queue could change
state, so it's a bug either way.)

This is the first of two fixes in error-handling here. The
EmitRead(UV_EOF) seems to additionally swallow fatal alerts from the
peer. Some of the ECONNRESET expectations in the tests aren't actually
correct. The next commit will fix this as well.

PR-URL: #44563
Reviewed-By: Luigi Pinca <[email protected]>
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
SSL_RECEIVED_SHUTDOWN means not just close_notify or fatal alert. From
what I can tell, this was just a mistake? OnStreamRead's comment
suggests eof_ was intended to be for close_notify.

This fixes a bug in TLSSocket error reporting that seems to have made it
into existing tests. If we receive a fatal alert, EmitRead(UV_EOF)
would, via onConnectEnd in _tls_wrap.js, synthesize an ECONNRESET before
the alert itself is surfaced. As a result, TLS alerts received during
the handshake are misreported by Node.

See the tests that had to be updated as part of this.

PR-URL: #44563
Reviewed-By: Luigi Pinca <[email protected]>
@juanarbol juanarbol mentioned this pull request Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants