Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-2677: clarify when ServerHeartbeatStartedEvent should be emitted #1483

Conversation

W-A-James
Copy link
Contributor

@W-A-James W-A-James commented Dec 14, 2023

Changes

  • Specifies that ServerHeartbeatStartedEvent MUST be emitted just after monitor creates monitoring connection and before the first hello is emitted.
  • Add prose test to verify ordering of event emission and handshake is correct

POC on Node driver

Please complete the following before merging:

  • Update changelog.
  • Make sure there are generated JSON files from the YAML test files.
  • Test changes in at least one language driver.
  • Test these changes against all server versions and topologies (including standalone, replica set, sharded clusters, and serverless).

@W-A-James W-A-James changed the title DRIVERS-2677: clarify heartbeat started event ordering and field requ… DRIVERS-2677: clarify when ServerHeartbeatStartedEvent should be emitted and when serverConnectionId should be undefined Jan 5, 2024
Comment on lines 265 to 296
Heartbeat Tests
~~~~~~~~~~~~~~~

1. Test that ``ServerHeartbeatStartedEvent`` is emitted after the monitoring socket was created and before the ``hello`` is sent to the server

#. Create a mock TCP server (example shown below) that pushes a ``client connected`` event to a shared array when a client connects and a ``client hello received`` event when the server receives the client hello and then closes the connection::

let events = [];
server = createServer(clientSocket => {
events.push('client connected');

clientSocket.on('data', () => {
events.push('client hello received');
clientSocket.end();
});
});
server.listen(9999);

#. Create a client with ``serverSelectionTimeoutMS: 500, maxPoolSize: 1, minPoolSize: 0`` and listen to ``ServerHeartbeatStartedEvent`` and ``ServerHeartbeatFailedEvent``, pushing the event name to the same shared array as the mock TCP server.

#. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect

#. Assert that the event array has a length of 4

#. Assert that the first event is the ``client connected`` event

#. Assert that the second event is the ``serverHeartbeatStartedEvent``

#. Assert that the third event is the ``client hello received`` event

#. Assert that the fourth event is the ``serverHeartbeatFailedEvent``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this test will be accurate with respect to the relative ordering of events on the driver side even though we'll be using sockets (albeit locally)?

This seems to be the best approach for now since we don't have more granular logging for when the the monitoring connection is created and when the hello itself is sent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so to me.

even though we'll be using sockets

What is the implied concern here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of whether or not the delay between creating the connection and emitting the ServerHeartbeatStartedEvent on the client side would be shorter than the time it takes for the mock server to emit the client connected event as in the following sequence diagram:

client server
create connection .
emit heartbeat started .
. receive connection; emit client connected
send hello .
. receive hello; emit client hello received; close connection
emit heartbeat failed .

which would lead the events buffer to look like this

[
	'serverHeartbeatStartedEvent',
	'client connected',
    'client hello received',
    'serverHeartbeatFiledEvent'
]

and report a false failure since the client actually performed the steps in the correct order

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. There is a bit of an assumption going on here that there wouldn't be any multithreading or multiprocessing forcing the server's "client connected" to strictly proceed the driver code getting an opportunity to run whatever "connection established" handling it has.

Perhaps the test could divest its core assertions a bit from the suggested implementation. A compiled language may be able to accomplish the same goal if it were to supplant the socket API with a function under its control to get notified of socket construction.

I hope there is enough prior art to make this test accessible:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: we want the order to be

['serverHeartbeatStarted', 'client connected', 'client hello received', 'serverHeartbeatFailed']

@W-A-James W-A-James marked this pull request as ready for review January 9, 2024 15:38
@W-A-James W-A-James requested a review from a team as a code owner January 9, 2024 15:38
@W-A-James W-A-James requested review from esha-bhargava and ShaneHarvey and removed request for a team January 9, 2024 15:38
source/server-discovery-and-monitoring/tests/README.rst Outdated Show resolved Hide resolved
source/server-discovery-and-monitoring/tests/README.rst Outdated Show resolved Hide resolved
Comment on lines 265 to 296
Heartbeat Tests
~~~~~~~~~~~~~~~

1. Test that ``ServerHeartbeatStartedEvent`` is emitted after the monitoring socket was created and before the ``hello`` is sent to the server

#. Create a mock TCP server (example shown below) that pushes a ``client connected`` event to a shared array when a client connects and a ``client hello received`` event when the server receives the client hello and then closes the connection::

let events = [];
server = createServer(clientSocket => {
events.push('client connected');

clientSocket.on('data', () => {
events.push('client hello received');
clientSocket.end();
});
});
server.listen(9999);

#. Create a client with ``serverSelectionTimeoutMS: 500, maxPoolSize: 1, minPoolSize: 0`` and listen to ``ServerHeartbeatStartedEvent`` and ``ServerHeartbeatFailedEvent``, pushing the event name to the same shared array as the mock TCP server.

#. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect

#. Assert that the event array has a length of 4

#. Assert that the first event is the ``client connected`` event

#. Assert that the second event is the ``serverHeartbeatStartedEvent``

#. Assert that the third event is the ``client hello received`` event

#. Assert that the fourth event is the ``serverHeartbeatFailedEvent``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so to me.

even though we'll be using sockets

What is the implied concern here?


#. Assert that the third event is the ``client hello received`` event

#. Assert that the fourth event is the ``serverHeartbeatFailedEvent``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of the "failed" event? It seems like we could just

  • test against a standalone
  • connect
  • confirm the ordering of events (connected, heartbeat started, hello received, heartbeat succeeded)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed event is expected as the mock server doesn't need to be able to properly reply to the hello for this test to prove the ordering so we expect that it is the case that it won't. As a result, we just want to be sure that the driver fails its heartbeat since the monitoring connection will be closed. I'm open to removing that as a requirement for the test, but it seems appropriate to include.

When you say test against a standalone, how would we track when the connection was created and the hello was received when testing against an actual Mongo server?

@W-A-James W-A-James requested a review from nbbeeken January 9, 2024 19:55
@@ -138,7 +138,7 @@ Events that MUST be published (with their conditions) are as follows.
* - ``TopologyClosedEvent``
- When a topology is shut down - this MUST be the last SDAM event fired.
* - ``ServerHeartbeatStartedEvent``
- Published when the server monitor sends its ``hello`` or legacy hello call to the server.
- Published when the server monitor sends its ``hello`` or legacy hello call to the server. When the monitor is creating a new connection, this event MUST be published just after the socket is created and just before the ``hello`` or legacy hello.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right? I don't think this is what we do in Node. we emit heartbeat started events before establishing a socket (before a call to connect)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shane and I are discussing this further, but our initial decision was to do what we currently do in the Node driver, but this does make it so that the the duration on the first ServerHeartbeatSucceededEvent would include the time taken to complete the TCP handshake which adds at least another network round trip, artificially inflating the RTT since that gets calculated from the aforementioned duration field.

How I currently have this written would avoid this, but would require changes in the Node driver among others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Will change this to expect the ServerHeartbeatStartedEvent before connection creation. There was confusion on my end as I thought we had to measure the start time as when the ServerHeartbeatStartedEvent was emitted. On Node's end this will require still require us to change how we implement our monitoring since we currently measure the heartbeat duration starting as mentioned above.

Comment on lines 268 to 289
1. Test that ``ServerHeartbeatStartedEvent`` is emitted after the monitoring socket was created and before the ``hello`` is sent to the server

#. Create a mock TCP server (example shown below) that pushes a ``client connected`` event to a shared array when a client connects and a ``client hello received`` event when the server receives the client hello and then closes the connection::

let events = [];
server = createServer(clientSocket => {
events.push('client connected');

clientSocket.on('data', () => {
events.push('client hello received');
clientSocket.end();
});
});
server.listen(9999);

#. Create a client with ``serverSelectionTimeoutMS: 500`` and listen to ``ServerHeartbeatStartedEvent`` and ``ServerHeartbeatFailedEvent``, pushing the event name to the same shared array as the mock TCP server.

#. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect

#. Assert that the event array has the following contents in the same order::

['serverHeartbeatStartedEvent', 'client connected', 'client hello received', 'serverHeartbeatFailedEvent']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaneHarvey

Since we're concerned with the the fact that the ServerHeartbeatStartedEvent gets emitted before the connection is created and so we should fail if the connection fails to be established, would it be sufficient for us to just try to connect to a port that has nothing listening on it and expect to get a ServerHeartbeatStartedEvent followed by a ServerHeartbeatFailedEvent, or do we want to keep this granularity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that test would work as well. Would it be easier to implement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would. It wouldn't require the mock server at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting this test; unless a driver can spy on the function that creates the socket, the updated version of this test wouldn't prevent something like this from passing:

let socketOrError = await connect().catch(error => error);

if (isSocket(socketOrError)) {
  emitHBStarted()
  await runHeartBeat()
} else {
  emitHBStarted()
  throw socketOrError;
}
// Assume emitHBFailed will be done if any of the above throws

We should roll back to actually using the mock server as that will provide a stronger guarantee on the ordering we want to test here.

@@ -138,7 +138,7 @@ Events that MUST be published (with their conditions) are as follows.
* - ``TopologyClosedEvent``
- When a topology is shut down - this MUST be the last SDAM event fired.
* - ``ServerHeartbeatStartedEvent``
- Published when the server monitor sends its ``hello`` or legacy hello call to the server.
- Published when the server monitor sends its ``hello`` or legacy hello call to the server. When the monitor is creating a new connection, this event MUST be published just before the socket is created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing punctuation: created.

If the ``serverConnectionId`` is not yet known at the time this log is emitted, the unstructured form SHOULD be as follows:

Heartbeat started for {{serverHost}}:{{serverPort}} on connection with driver-generated ID {{driverConnectionId}} in topology with ID
{{topologyId}}. Awaited: {{awaited}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documenting behavior that's already tested? Or do we need a new test for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd note that this seems unrelated to DRIVERS-2677. Can we open a new drivers ticket for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new behaviour that needs to be tested, but can't be until a new DRIVERS ticket to add serverConnectionId to ServerHeartbeatStartedEvent as an optional field is created. I'm open to removing this until that ticket is filed and in progress.

Copy link
Member

@ShaneHarvey ShaneHarvey Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I vote we move it out to that ticket to avoid confusion and reduce the scope here.

Heartbeat Tests
~~~~~~~~~~~~~~~

1. Test that ``ServerHeartbeatStartedEvent`` is emitted after the monitoring socket was created and before the ``hello`` is sent to the server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitted after the monitoring socket was created
This is the opposite of what the current spec says.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Comment on lines 268 to 289
1. Test that ``ServerHeartbeatStartedEvent`` is emitted after the monitoring socket was created and before the ``hello`` is sent to the server

#. Create a mock TCP server (example shown below) that pushes a ``client connected`` event to a shared array when a client connects and a ``client hello received`` event when the server receives the client hello and then closes the connection::

let events = [];
server = createServer(clientSocket => {
events.push('client connected');

clientSocket.on('data', () => {
events.push('client hello received');
clientSocket.end();
});
});
server.listen(9999);

#. Create a client with ``serverSelectionTimeoutMS: 500`` and listen to ``ServerHeartbeatStartedEvent`` and ``ServerHeartbeatFailedEvent``, pushing the event name to the same shared array as the mock TCP server.

#. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect

#. Assert that the event array has the following contents in the same order::

['serverHeartbeatStartedEvent', 'client connected', 'client hello received', 'serverHeartbeatFailedEvent']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that test would work as well. Would it be easier to implement?

@W-A-James W-A-James requested a review from ShaneHarvey January 11, 2024 21:20
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the serverConnectionId bits from the PR title and description?

@@ -462,7 +464,7 @@ The following key-value pairs are common to all or several log messages and MUST
* - serverConnectionId
- Heartbeat-related log messages
- Int
- The server's ID for the monitoring connection, if known. This value will be unknown and can be omitted in certain cases, e.g. the first
- The server's ID for the monitoring connection, if known. This value will be unknown and MUST be omitted in certain cases, e.g. the first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be pushed out to the other ticket.


#. Attempt to connect client to a local port that is not listening for connections. e.g. ``mongodb://localhost:9999``

#. Assert that the event array has the following contents in the same order::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SDAM can try to connect multiple times, I would say:
"Assert that the first two events in the array are::"

@W-A-James W-A-James changed the title DRIVERS-2677: clarify when ServerHeartbeatStartedEvent should be emitted and when serverConnectionId should be undefined DRIVERS-2677: clarify when ServerHeartbeatStartedEvent should be emitted Jan 12, 2024
@W-A-James W-A-James requested a review from ShaneHarvey January 12, 2024 20:35
@W-A-James W-A-James requested review from ShaneHarvey and removed request for ShaneHarvey January 16, 2024 19:24
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@W-A-James W-A-James merged commit f1bbc24 into mongodb:master Jan 17, 2024
4 checks passed
@W-A-James W-A-James deleted the DRIVERS-2677/clarify-heartbeat-started-event-ordering branch January 17, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants