Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish seed node connections in async during node bootstrap #8038

Merged
merged 4 commits into from
Jun 15, 2023

Conversation

ankitkala
Copy link
Member

@ankitkala ankitkala commented Jun 13, 2023

Description

Establish seed node connections in async during node bootstrap

Related Issues

Resolves #7950

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
      1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

@codecov
Copy link

codecov bot commented Jun 13, 2023

Codecov Report

Merging #8038 (5ef6976) into main (bf3cf71) will decrease coverage by 0.10%.
The diff coverage is 100.00%.

❗ Current head 5ef6976 differs from pull request most recent head ee815c6. Consider uploading reports for the commit ee815c6 to get more accurate results

@@             Coverage Diff              @@
##               main    #8038      +/-   ##
============================================
- Coverage     70.92%   70.83%   -0.10%     
+ Complexity    56636    56575      -61     
============================================
  Files          4721     4721              
  Lines        267514   267470      -44     
  Branches      39212    39210       -2     
============================================
- Hits         189745   189454     -291     
- Misses        61770    62016     +246     
- Partials      15999    16000       +1     
Impacted Files Coverage Δ
...org/opensearch/transport/RemoteClusterService.java 86.79% <100.00%> (+4.69%) ⬆️
...ava/org/opensearch/transport/TransportService.java 80.03% <100.00%> (+0.46%) ⬆️

... and 454 files with indirect coverage changes

Copy link

@krishna-ggk krishna-ggk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one consequence of this could be that node is up while remote connection establishment is still in progress in which case any outbound cross-cluster requests could fail with "no such remote" right?

Could this also impact CCR?

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one common problem with async connection handling is the race around connection and disconnection. See #4874. Want to ensure we don't run into a similar issue going forward

@shwetathareja
Copy link
Member

@ankitkala can you please highlight the implication of moving the remote connection to async. As i understand today also there is no guarantee that connection would be established in sync in 10secs. What happens in case a CCR/ CCS request comes. What is the error thrown?

@ankitkala
Copy link
Member Author

So one consequence of this could be that node is up while remote connection establishment is still in progress in which case any outbound cross-cluster requests could fail with "no such remote" right?

Could this also impact CCR?

@ankitkala can you please highlight the implication of moving the remote connection to async. As i understand today also there is no guarantee that connection would be established in sync in 10secs. What happens in case a CCR/ CCS request comes. What is the error thrown?

This should not impact any of the cross cluster calls. For each such call, Remote cluster client invokes a blocking call to remoteClusterService.ensureConnected to ensure that the connection is established here (ensureConnected and connectImpl)
This code change makes the connect as async only during node bootstrap. Cross cluster calls and Seed node updates aren't affected by this change.

@ankitkala ankitkala changed the title Establish seed node connections in async Establish seed node connections in async during node bootstrap Jun 14, 2023
@ankitkala
Copy link
Member Author

The one common problem with async connection handling is the race around connection and disconnection. See #4874. Want to ensure we don't run into a similar issue going forward

Thanks. I took a cursory look at the issue above.
This change isn't modifying the behaviour of how the connection is established/disconnected. So I don't expect any issues there.

Signed-off-by: Ankit Kala <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary
      1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

@shwetathareja shwetathareja merged commit cb4361c into opensearch-project:main Jun 15, 2023
@shwetathareja shwetathareja added the backport 2.x Backport to 2.x branch label Jun 15, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8038-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 cb4361c00107e113c82f87103307c3626583e940
# Push it to GitHub
git push --set-upstream origin backport/backport-8038-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8038-to-2.x.

ankitkala added a commit to ankitkala/OpenSearch that referenced this pull request Jun 15, 2023
…earch-project#8038)

* Establish seed node connection setup in async

Signed-off-by: Ankit Kala <[email protected]>
shwetathareja pushed a commit that referenced this pull request Jun 15, 2023
#8077)

* Establish seed node connection setup in async

Signed-off-by: Ankit Kala <[email protected]>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jun 27, 2023
…earch-project#8038)

* Establish seed node connection setup in async

Signed-off-by: Ankit Kala <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…earch-project#8038)

* Establish seed node connection setup in async

Signed-off-by: Ankit Kala <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
4 participants