Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close channel on handshake error with old version #56989

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented May 20, 2020

Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use Version.CURRENT.minimumCompatibilityVersion()
to let us handshake with older nodes. This results in the strange situation of
a node of major version N responding to a node of major version N-1 using a
wire format of version N-2.

We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender misinterpreting
the response which may have unpredictable consequences.

Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.

Closes #54337

Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.

We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the sender misinterpreting
the response which may have unpredictable consequences.

Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.

Closes elastic#54337
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Network)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 20, 2020
@DaveCTurner
Copy link
Contributor Author

The change to production code here is small and seems reasonable, but I really struggled to come up with sensible tests. Perhaps there's some appropriate seam that I'm missing? Anyway, this is what I ended up with, suggestions for alternative testing approaches are very welcome.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just build this as a unit test for InboundHandler? The whole approach of testing this using the full transport services seems quite convoluted and in the end we're not mocking out the internals starting from where the message passes to InboundHandler anyway so it's not like we add coverage by doing that? (just a suggestion though, I realise it's a bit of extra effort to do this just for the sake of a having a nicer test)
Otherwise looks good I think from a quick read, just a suggestion on shortening things a little.

@DaveCTurner
Copy link
Contributor Author

Ugh you're quite right this is now really just focussed on InboundHandler so let's just test that.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming CI is happy :)) Thanks for the extra iteration on the tests!

@DaveCTurner
Copy link
Contributor Author

CI failure is unrelated, I opened #57010.

@elasticmachine please run elasticsearch-ci/1

@DaveCTurner DaveCTurner merged commit 65f3cb1 into elastic:master May 21, 2020
@DaveCTurner DaveCTurner deleted the 2020-05-20-close-channel-on-handshake-error branch May 21, 2020 06:59
DaveCTurner added a commit that referenced this pull request May 21, 2020
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.

We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.

Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.

Closes #54337
DaveCTurner added a commit that referenced this pull request May 21, 2020
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request May 21, 2020
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.

We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.

Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.

Closes elastic#54337
DaveCTurner added a commit that referenced this pull request May 21, 2020
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.

We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.

Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.

Closes #54337
DaveCTurner added a commit that referenced this pull request May 21, 2020
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.

We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.

Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.

Closes #54337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Network Http and internode communication implementations Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.8.1 v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed transport handshake between 6.8 and 7.6 nodes may throw a fatal AssertionError
4 participants