Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-2578 Drivers use polling SDAM on AWS Lambda #1452

Merged
merged 16 commits into from
Oct 12, 2023

Conversation

ShaneHarvey
Copy link
Member

@ShaneHarvey ShaneHarvey commented Aug 21, 2023

DRIVERS-2578

Changes:

  • Disable streaming SDAM by default on AWS Lambda and similar FaaS platforms.
  • Introduce the serverMonitoringMode=stream/poll/auto URI option.
  • Add Unified Test Format version 1.17 to add support for server heartbeat events and asserting on the "awaited" field.
  • Clients MUST NOT use dedicated connections to measure RTT when using the polling protocol.

Checklist:

Disable streaming SDAM by default on AWS Lambda and similar FaaS platforms.
Introduce the sdamMode=stream/poll/auto URI option.
@ShaneHarvey ShaneHarvey requested a review from jyemin August 22, 2023 20:05
@@ -243,6 +243,8 @@ the function implementation the driver MUST:
- Drivers MUST record the durations and counts of the heartbeats, the durations of the
commands, as well as keep track of the number of open connections, and report this information in
the function response as JSON.
- Drivers MUST assert no ServerHeartbeat events contain the ``awaited=True`` flag to
Copy link
Contributor

@prestonvasquez prestonvasquez Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth extending the test plan for handshakes (https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#test-plan) to check that the the initial handshake uses OP_QUERY? As of this ticket, there is a possibility that polling could be used with a server versions >= 4.4 . And AFAIK this is novel and so potentially important to ensure that correct handshake logic is being implemented.

It's probably not very important, as the wire message between streaming / polling are not that different. But I do notice we don't check this anywhere and so it could be useful to do so as a part of this change.

Here is an example of how the Go Driver intends to check this: https://github.com/mongodb/mongo-go-driver/blob/fe237bb0079574fb76c859332218d7c49921210d/mongo/integration/handshake_test.go#L177-L182

CC: @baileympearson

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this idea is worthwhile but as a new drivers ticket since this PR doesn't change the OP_QUERY vs OP_MSG behavior.

Comment on lines +745 to +747
if streamingEnabled and serverSupportsStreaming and not rttMonitor.started:
# Start the RttMonitor.
rttMonitor.run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we want to disable the RTT monitor for servers that don't support streaming heartbeat (i.e. server versions before 4.4)?

From reading the rationale "Why disable the streaming protocol on FaaS platforms like AWS Lambda?", it sounds like the goal of not starting the RTT monitor is to reduce open connections for FaaS environments. However, the suggested change would disable the RTT monitor for all databases earlier than 4.4, not just FaaS environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a new requirement - drivers shouldn't be creating RTT monitors when the polling protocol is used. The rtt monitors are only necessary when using the awaitable hello protocol, because then a round trip to the server can take up to maxAwaitTimeMS. https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-monitoring.rst#measuring-rtt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec wording and pseudo-code don't match your assertion.

The Measuring RTT section specifically says drivers can still use a separate RTT monitor for any server version:

For consistency, clients MAY use dedicated connections to measure RTT for all servers, even those that do not support awaitable hello or legacy hello.

The current pseudo-code from the Monitor thread section also describes always starting the RTT monitor, independent of server version:

class Monitor(Thread):
...

  def run():
      # Start the RttMonitor.
      rttMonitor.run()
...

The new spec wording in this PR only suggests disabling the RTT monitor when streaming is disabled, but nothing about whether or not it's supported:

When the streaming protocol is disabled the client MUST use the polling protocol and MUST NOT start an extra thread or connection for Measuring RTT.

Disabling the RTT monitor when streaming isn't supported is allowed by the spec. However, putting it in the pseudo-code makes it seem like a new requirement that conflicts with the spec wording.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaneHarvey Can you opine on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this PR changes the rules to disallow using an extra thread/socket for RTT on <4.4 servers. The rationale is that the extra connection limits scaling and puts unnecessary load on the server, especially in Lambda where MongoClients are not shared. Although the motivation is mostly Lambda this is still a beneficial change for all environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @matthewdale 's point, should this section from Measuring RTT be removed or updated to reflect these rule changes? :

For consistency, clients MAY use dedicated connections to measure RTT for all servers, even those that do not support awaitable hello or legacy hello.

Copy link
Member Author

@ShaneHarvey ShaneHarvey Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated this section to say clients MUST NOT use dedicated connections to measure RTT in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Does that imply that the polling monitor should measure RTT (e.g. to support CSOT)? For the Go driver, that would significantly increases the scope of this change because we'd have to build RTT monitoring capability into the polling monitor.

Copy link
Member Author

@ShaneHarvey ShaneHarvey Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this question because the polling protocol always included RTT monitoring. But yes, the polling protocol should measure RTT via it's own heartbeats, not with a separate dedication connection.

Copy link
Contributor

@matthewdale matthewdale Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer, the actual change in the Go driver would probably be minimal. It does add complexity because there is no longer a single source of RTT data for CSOT, but the change only impacts a few lines of code. Consider this comment thread resolved.

@ShaneHarvey ShaneHarvey requested a review from a team as a code owner October 5, 2023 18:24
@ShaneHarvey ShaneHarvey requested a review from jyemin October 5, 2023 18:34
@ShaneHarvey
Copy link
Member Author

I've updated the Unified Test Format to version 1.17 by adding support for listening to SDAM ServerHeartbeat events and asserting on the "awaited" field. This improves the SDAM integration test. Ready for another look.

@ShaneHarvey ShaneHarvey requested review from matthewdale, prestonvasquez and baileympearson and removed request for esha-bhargava October 6, 2023 19:14
Copy link
Contributor

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@ShaneHarvey ShaneHarvey merged commit 14a6c81 into mongodb:master Oct 12, 2023
3 checks passed
@ShaneHarvey ShaneHarvey deleted the DRIVERS-2578 branch October 12, 2023 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants