Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(discovery): do not fail startup if plugin ping fails #1700

Merged
merged 2 commits into from
Oct 10, 2023

Conversation

andrewazores
Copy link
Member

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes: #1698

Description of the change:

Modifies the DiscoveryStorage plugin ping logic to be tolerant of request failures or network exceptions by remapping the ping task future to one that logs exceptions but does not propagate the Future failure to the aggregate Future representing the entire task.

Motivation for the change:

When this plugin ping prune task is done periodically failures are already ignored. However, when this task is done initially at startup to check plugin registrations already present in the database from the last run, ping failures would fail the whole task, cascading into failing the server startup. This should not happen - the server is not dependent upon the discovery plugins and can handle their absence fine, so it should not be startup-blocking if an exception occurs when probing them.

How to manually test:

  1. Tough to nail down because it requires a narrow condition where the discovery plugin is present but a network error occurs during the registration ping request, ex. the network stack says that the plugin hostname can be resolved but the connection is refused because the plugin JVM is not yet ready to accept connections.

@github-actions
Copy link
Contributor

Hi @andrewazores! Add at least one of the required labels to this PR

Required labels are : chore,ci,cleanup,docs,feat,fix,perf,refactor,style,test

@mergify mergify bot added the safe-to-test label Oct 10, 2023
@github-actions
Copy link
Contributor

Hi @andrewazores! Add at least one of the required labels to this PR

Required labels are : chore,ci,cleanup,docs,feat,fix,perf,refactor,style,test

@github-actions
Copy link
Contributor

Hi @andrewazores! Add at least one of the required labels to this PR

Required labels are : chore,ci,cleanup,docs,feat,fix,perf,refactor,style,test

@andrewazores andrewazores marked this pull request as ready for review October 10, 2023 13:35
@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@github-actions
Copy link
Contributor

ARCH IMAGE
amd64 ghcr.io/cryostatio/cryostat:pr-1700-1c1d3b17c939d749b8192078af7ef97eb87573ec-linux-amd64
arm64 ghcr.io/cryostatio/cryostat:pr-1700-1c1d3b17c939d749b8192078af7ef97eb87573ec-linux-arm64

To run smoketest:

# amd64          
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1700-1c1d3b17c939d749b8192078af7ef97eb87573ec-linux-amd64 sh smoketest.sh

# or arm64
CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1700-1c1d3b17c939d749b8192078af7ef97eb87573ec-linux-arm64 sh smoketest.sh

Copy link
Contributor

@aali309 aali309 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@andrewazores andrewazores merged commit 026f0e8 into cryostatio:main Oct 10, 2023
10 of 11 checks passed
@andrewazores andrewazores deleted the startup-discovery-ping branch October 10, 2023 21:16
mergify bot pushed a commit that referenced this pull request Oct 10, 2023
andrewazores added a commit that referenced this pull request Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Bug] Startup failure due to discovery ping failure
2 participants