Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for initial routers DNS resolution #849

Merged
merged 2 commits into from
Mar 17, 2021

Conversation

injectives
Copy link
Contributor

@injectives injectives commented Mar 10, 2021

Fix for initial routers DNS resolution

This update fixes the following issue: #833

The desired behaviour for getting a routing table from the initial router (either on bootstrap or when all known routers have failed) is:

  • resolve the domain name to all IPs
  • attempt getting a routing table from all of them until first one succeeds by:
      - getting a connection
      - trying to get a successful routing table response

Prior to this change, the connection pools were created for host and port pairs. When domain name of the host resolves to multiple IP addresses, such pools provide connections to those IPs as a group. While this works for readers and writers, it negatively impacts the routing table fetching process as there is no guarantee which IP address the provided connection is setup for.

This update delivers the following changes:

  • connection pools for routers are IP address based, which allows for deterministic connection retrieval
  • the resolved IP address set is kept up-to-date (in case known router IPs change) to make sure that the unused connection pools are flushed
  • the domain name resolution logic has been made configurable (it is private at the moment and is used to facilitate testing)
  • the testkit backend has been updated to support the domain name resolution configuration (a new test has been added to testkit to cover the issue described above)
  • the testkit backend has been updated to support connection timeout driver configuration
  • several tests have been updated to adopt the new changes

Copy link
Contributor

@gjmwoods gjmwoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💪

@injectives injectives force-pushed the feature/dns-resolution branch from 371e202 to 066e456 Compare March 10, 2021 16:17
@injectives injectives self-assigned this Mar 10, 2021
@injectives injectives force-pushed the feature/dns-resolution branch 5 times, most recently from 4bf4891 to ae2a1b3 Compare March 15, 2021 18:04
This update fixes the following issue: neo4j#833

The desired behaviour for getting a routing table from the initial router (either on bootstrap or when all known routers have failed) is:
- resolve the domain name to all IPs
- attempt getting a routing table from all of them until first one succeeds by:
  - getting a connection
  - trying to get a successful routing table response

Prior to this change, the connection pools were created for host and port pairs. When domain name of the host resolves to multiple IP addresses, such pools provide connections to those IPs as a group. While this works for readers and writers, it negatively impacts the routing table fetching process as there is no guarantee which IP address the provided connection is setup for.

This update delivers the following changes:
- connection pools for routers are IP address based, which allows for deterministic connection retrieval
- the resolved IP address set is kept up-to-date (in case known router IPs change) to make sure that the unused connection pools are flushed
- the domain name resolution logic has been made configurable (it is private at the moment and is used to facilitate testing)
- the testkit backend has been updated to support the domain name resolution configuration (a new test has been added to testkit to cover the issue described above)
- the testkit backend has been updated to support connection timeout driver configuration
- several tests have been updated to adopt the new changes
@injectives injectives force-pushed the feature/dns-resolution branch from ae2a1b3 to 282cf78 Compare March 15, 2021 18:29
@eastlondoner eastlondoner self-assigned this Mar 15, 2021
@injectives injectives closed this Mar 17, 2021
@injectives injectives reopened this Mar 17, 2021
@injectives injectives merged commit b98b697 into neo4j:4.3 Mar 17, 2021
injectives added a commit to injectives/neo4j-java-driver that referenced this pull request Mar 17, 2021
* Fix for initial routers DNS resolution

This update fixes the following issue: neo4j#833

The desired behaviour for getting a routing table from the initial router (either on bootstrap or when all known routers have failed) is:
- resolve the domain name to all IPs
- attempt getting a routing table from all of them until first one succeeds by:
  - getting a connection
  - trying to get a successful routing table response

Prior to this change, the connection pools were created for host and port pairs. When domain name of the host resolves to multiple IP addresses, such pools provide connections to those IPs as a group. While this works for readers and writers, it negatively impacts the routing table fetching process as there is no guarantee which IP address the provided connection is setup for.

This update delivers the following changes:
- connection pools for routers are IP address based, which allows for deterministic connection retrieval
- the resolved IP address set is kept up-to-date (in case known router IPs change) to make sure that the unused connection pools are flushed
- the domain name resolution logic has been made configurable (it is private at the moment and is used to facilitate testing)
- the testkit backend has been updated to support the domain name resolution configuration (a new test has been added to testkit to cover the issue described above)
- the testkit backend has been updated to support connection timeout driver configuration
- several tests have been updated to adopt the new changes

* Updating test name
injectives added a commit to injectives/neo4j-java-driver that referenced this pull request Mar 22, 2021
* Fix for initial routers DNS resolution

This update fixes the following issue: neo4j#833

The desired behaviour for getting a routing table from the initial router (either on bootstrap or when all known routers have failed) is:
- resolve the domain name to all IPs
- attempt getting a routing table from all of them until first one succeeds by:
  - getting a connection
  - trying to get a successful routing table response

Prior to this change, the connection pools were created for host and port pairs. When domain name of the host resolves to multiple IP addresses, such pools provide connections to those IPs as a group. While this works for readers and writers, it negatively impacts the routing table fetching process as there is no guarantee which IP address the provided connection is setup for.

This update delivers the following changes:
- connection pools for routers are IP address based, which allows for deterministic connection retrieval
- the resolved IP address set is kept up-to-date (in case known router IPs change) to make sure that the unused connection pools are flushed
- the domain name resolution logic has been made configurable (it is private at the moment and is used to facilitate testing)
- the testkit backend has been updated to support the domain name resolution configuration (a new test has been added to testkit to cover the issue described above)
- the testkit backend has been updated to support connection timeout driver configuration
- several tests have been updated to adopt the new changes

* Updating test name
@injectives injectives mentioned this pull request Mar 22, 2021
injectives added a commit that referenced this pull request Mar 23, 2021
* Imported testkit directory

* Migrating tests to testkit (#832)

* Migrating tests to testkit

Short summary of this update:
- removed migrated tests
- verifyConnectivity support
- resolver support
- consume support

Test mapping (dest: stub/routing.py):
- shouldHandleAcquireReadSession -> test_should_read_successfully_from_reader_using_session_run
- shouldHandleAcquireReadTransaction -> test_should_read_successfully_from_reader_using_tx_function
- shouldHandleAcquireReadSessionAndTransaction -> test_should_read_successfully_from_reader_using_tx_run
- shouldRoundRobinReadServers -> test_should_round_robin_readers_when_reading_using_session_run
- shouldRoundRobinReadServersWhenUsingTransaction -> test_should_round_robin_readers_when_reading_using_tx_run
- shouldThrowSessionExpiredIfReadServerDisappears -> test_should_fail_when_reading_from_unexpectedly_interrupting_reader_using_session_run
- shouldThrowSessionExpiredIfReadServerDisappearsWhenUsingTransaction -> test_should_fail_when_reading_from_unexpectedly_interrupting_reader_using_tx_run
- shouldThrowSessionExpiredIfWriteServerDisappears -> test_should_fail_when_writing_on_unexpectedly_interrupting_writer_using_session_run
- shouldThrowSessionExpiredIfWriteServerDisappearsWhenUsingTransaction -> test_should_fail_when_writing_on_unexpectedly_interrupting_writer_using_tx_run
- shouldHandleAcquireWriteSession -> test_should_write_successfully_on_writer_using_session_run
- shouldHandleAcquireWriteTransaction -> test_should_write_successfully_on_writer_using_tx_function
- shouldHandleAcquireWriteSessionAndTransaction -> test_should_write_successfully_on_writer_using_tx_run
- shouldRoundRobinWriteSessions -> test_should_round_robin_writers_when_writing_using_session_run
- shouldRoundRobinWriteSessionsInTransaction -> test_should_round_robin_writers_when_writing_using_tx_run
- shouldFailOnNonDiscoverableServer -> test_should_fail_discovery_when_router_fails_with_procedure_not_found_code
- shouldFailRandomFailureInGetServers -> test_should_fail_discovery_when_router_fails_with_unknown_code
- shouldHandleLeaderSwitchWhenWriting -> test_should_fail_when_writing_on_writer_that_returns_not_a_leader_code
- shouldHandleLeaderSwitchWhenWritingWithoutConsuming -> test_should_fail_when_writing_without_explicit_consumption_on_writer_that_returns_not_a_leader_code
- shouldHandleLeaderSwitchWhenWritingInTransaction -> test_should_fail_when_writing_on_writer_that_returns_not_a_leader_code_using_tx_run
- shouldUseWriteSessionModeAndInitialBookmark -> test_should_use_write_session_mode_and_initial_bookmark_when_writing_using_tx_run
- shouldUseReadSessionModeAndInitialBookmark -> test_should_use_read_session_mode_and_initial_bookmark_when_reading_using_tx_run
- shouldPassBookmarkFromTransactionToTransaction -> test_should_pass_bookmark_from_tx_to_tx_using_tx_run
- shouldRetryReadTransactionUntilSuccess -> test_should_retry_read_tx_until_success
- shouldRetryWriteTransactionUntilSuccess -> test_should_retry_write_tx_until_success
- shouldRetryReadTransactionAndPerformRediscoveryUntilSuccess -> test_should_retry_read_tx_and_rediscovery_until_success
- shouldRetryWriteTransactionAndPerformRediscoveryUntilSuccess -> test_should_retry_write_tx_and_rediscovery_until_success
- shouldUseInitialRouterForRediscoveryWhenAllOtherRoutersAreDead -> test_should_use_initial_router_for_discovery_when_others_unavailable
- shouldInvokeProcedureGetRoutingTableWhenServerVersionPermits -> test_should_successfully_read_from_readable_router_using_tx_function
- shouldSendEmptyRoutingContextInHelloMessage -> test_should_send_empty_hello
- shouldServeReadsButFailWritesWhenNoWritersAvailable -> test_should_serve_reads_and_fail_writes_when_no_writers_available
- shouldAcceptRoutingTableWithoutWritersAndThenRediscover -> test_should_accept_routing_table_without_writers_and_then_rediscover
- shouldTreatRoutingTableWithSingleRouterAsValid -> test_should_accept_routing_table_with_single_router
- shouldSendMultipleBookmarks -> test_should_successfully_send_multiple_bookmarks
- shouldForgetAddressOnDatabaseUnavailableError -> test_should_forget_address_on_database_unavailable_error
- shouldUseResolverDuringRediscoveryWhenExistingRoutersFail -> test_should_use_resolver_during_rediscovery_when_existing_routers_fail
- shouldRevertToInitialRouterIfKnownRouterThrowsProtocolErrors -> test_should_revert_to_initial_router_if_known_router_throws_protocol_errors

* Removing redundant stub server scripts

* Migrating tests to testkit part 2 (#839)

- shouldSendRoutingContextToServer -> test_should_successfully_get_routing_table_with_context
- shouldSendRoutingContextInHelloMessage -> test_should_successfully_get_routing_table_with_context
- shouldHandleLeaderSwitchAndRetryWhenWritingInTxFunction -> test_should_write_successfully_on_leader_switch_using_tx_function
- shouldSendInitialBookmark -> test_should_use_write_session_mode_and_initial_bookmark_when_writing_using_tx_run
- shouldRetryWriteTransactionUntilSuccessWithWhenLeaderIsRemoved -> test_should_retry_write_until_success_with_leader_change_using_tx_function
- shouldRetryWriteTransactionUntilSuccessWithWhenLeaderIsRemovedV3 -> test_should_retry_write_until_success_with_leader_shutdown_during_tx_using_tx_function

* Migrating tests to testkit part 3 (#840)

Adding support for supportsMultiDB call.

And exporting the following tests to testkit:
- shouldServerWithBoltV4SupportMultiDb -> test_should_successfully_check_if_support_for_multi_db_is_available
- shouldServerWithBoltV3NotSupportMultiDb -> test_should_successfully_check_if_support_for_multi_db_is_available

Removing redundant scripts

* Stub tests migration part 4 (#847)

Removed RoutingDriverMultidatabaseBoltKitIT

Migrated tests:
- shouldDiscoverForDatabase -> test_should_read_successfully_from_reader_using_session_run (this tests seems to cover the same use-case and uses a non-default DB for 4+ versions)
- shouldRetryOnEmptyDiscoveryResult -> test_should_read_successfully_on_empty_discovery_result_using_session_run
- shouldThrowRoutingErrorIfDatabaseNotFound -> test_should_fail_with_routing_failure_on_db_not_found_discovery_failure
- shouldBeAbleToServeReachableDatabase -> test_should_read_successfully_from_reachable_db_after_trying_unreachable_db (message check has been removed)
- shouldPassSystemBookmarkWhenGettingRoutingTableForMultiDB -> test_should_pass_system_bookmark_when_getting_rt_for_multi_db (seems to be applicable to V4 only, also the stub server doesn't seem to check bookmarks)
- shouldIgnoreSystemBookmarkWhenGettingRoutingTable -> test_should_ignore_system_bookmark_when_getting_rt_for_multi_db
- shouldDriverVerifyConnectivity -> test_should_successfully_get_routing_table_with_context (pre-existing test that already tests the connectivity)

Also removed redundant scripts and added code support to DriverError

* Fix for initial routers DNS resolution (#849)

* Fix for initial routers DNS resolution

This update fixes the following issue: #833

The desired behaviour for getting a routing table from the initial router (either on bootstrap or when all known routers have failed) is:
- resolve the domain name to all IPs
- attempt getting a routing table from all of them until first one succeeds by:
  - getting a connection
  - trying to get a successful routing table response

Prior to this change, the connection pools were created for host and port pairs. When domain name of the host resolves to multiple IP addresses, such pools provide connections to those IPs as a group. While this works for readers and writers, it negatively impacts the routing table fetching process as there is no guarantee which IP address the provided connection is setup for.

This update delivers the following changes:
- connection pools for routers are IP address based, which allows for deterministic connection retrieval
- the resolved IP address set is kept up-to-date (in case known router IPs change) to make sure that the unused connection pools are flushed
- the domain name resolution logic has been made configurable (it is private at the moment and is used to facilitate testing)
- the testkit backend has been updated to support the domain name resolution configuration (a new test has been added to testkit to cover the issue described above)
- the testkit backend has been updated to support connection timeout driver configuration
- several tests have been updated to adopt the new changes

* Updating test name

* Fixed SSL handling (#851)

This update fixes a number of SSL-related tests in testkit and CausalClusteringIT.shouldDropBrokenOldConnections test.
The connection pooling strategy has been updated to use the same connection pool when the connection host is unambiguous.
Removed hardcoded domain name resolution from the BoltServerAddress and moved the logic to ChannelConnectorImpl that uses the DomainNameResolver.

* Updated .gitignore
@injectives injectives deleted the feature/dns-resolution branch June 24, 2021 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants