Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto-refresh on new revisions during sync retries (Alpha) (#11494) #15603

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aslafy-z
Copy link
Contributor

@aslafy-z aslafy-z commented Sep 20, 2023

This feature allows users to configure their apps to refresh on new revisions, when the current sync is retrying.
This is controlled by a new boolean Application CRD field syncPolicy.retry.refresh or via the --sync-retry-refresh flag.

Closes #11494
Related to #6055
Discussed at https://www.youtube.com/watch?v=baIX9Bk6f5w&t=1173s

Initially based of the work Sayrus did at Sayrus@817bc34

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

@codecov
Copy link

codecov bot commented Sep 23, 2023

Codecov Report

Attention: Patch coverage is 53.33333% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 44.92%. Comparing base (21b1514) to head (20b6f91).
Report is 8 commits behind head on master.

Files Patch % Lines
controller/appcontroller.go 0.00% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #15603      +/-   ##
==========================================
- Coverage   44.93%   44.92%   -0.02%     
==========================================
  Files         354      354              
  Lines       47742    47720      -22     
==========================================
- Hits        21454    21437      -17     
  Misses      23485    23485              
+ Partials     2803     2798       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aslafy-z aslafy-z force-pushed the patch-3 branch 2 times, most recently from 3872209 to 0a95fd5 Compare September 27, 2023 10:54
@aslafy-z aslafy-z changed the title feat: add sync timeout feat: add application sync timeout Sep 27, 2023
@aslafy-z aslafy-z force-pushed the patch-3 branch 5 times, most recently from 539d026 to a363b85 Compare September 28, 2023 18:00
@aslafy-z aslafy-z changed the title feat: add application sync timeout feat: abort ongoing operation when a new revision is available Sep 28, 2023
@aslafy-z aslafy-z force-pushed the patch-3 branch 5 times, most recently from 0713719 to 3850169 Compare September 28, 2023 18:44
@@ -1195,6 +1195,14 @@ func (ctrl *ApplicationController) processRequestedAppOperation(app *appv1.Appli
ctrl.requestAppRefresh(app.QualifiedName(), CompareWithLatest.Pointer(), &retryAfter)
return
} else {
if state.Phase == synccommon.OperationRunning && app.Spec.SyncPolicy.Retry.Refresh && app.Status.Sync.Revision != state.Operation.Sync.Revision {
Copy link
Contributor Author

@aslafy-z aslafy-z Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggers when processing resources, not hooks. Are hooks processed somewhere else?

logCtx.Infof("A new revision is available, refreshing and terminating app, was phase: %s, message: %s", state.Phase, state.Message)
ctrl.requestAppRefresh(app.QualifiedName(), CompareWithLatest.Pointer(), nil)
state.Phase = synccommon.OperationTerminating
state.Message = "Operation forced to terminate (new revision available)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find this message on the UI, is there a way to make it appear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to move setOperationState above the requestAppRefresh, since otherwise there can be a race condition by the refresh processor kicking in and message not being ready yet.

@aslafy-z aslafy-z force-pushed the patch-3 branch 4 times, most recently from 0d2a152 to ca363b9 Compare October 2, 2023 12:10
@aslafy-z aslafy-z changed the title feat: abort ongoing operation when a new revision is available feat: abort ongoing operation when a new revision is available (#11494) Oct 2, 2023
@aslafy-z
Copy link
Contributor Author

aslafy-z commented Nov 1, 2023

@alexec any inputs?

assets/swagger.json Outdated Show resolved Hide resolved
Copy link
Contributor

@todaywasawesome todaywasawesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "whether" is the wrong word to use and it is misspelled. If you're ok with the change you can just accept the commit.

manifests/core-install.yaml Show resolved Hide resolved
manifests/core-install.yaml Show resolved Hide resolved
manifests/core-install.yaml Show resolved Hide resolved
manifests/crds/application-crd.yaml Outdated Show resolved Hide resolved
manifests/crds/application-crd.yaml Outdated Show resolved Hide resolved
manifests/ha/install.yaml Show resolved Hide resolved
manifests/ha/install.yaml Show resolved Hide resolved
manifests/install.yaml Show resolved Hide resolved
manifests/install.yaml Show resolved Hide resolved
manifests/install.yaml Show resolved Hide resolved
@aslafy-z aslafy-z force-pushed the patch-3 branch 2 times, most recently from 45bfc7e to f2d6fe4 Compare May 25, 2024 18:08
@aslafy-z aslafy-z requested a review from todaywasawesome May 25, 2024 18:09
@aslafy-z aslafy-z requested a review from a team as a code owner May 25, 2024 18:28
@aslafy-z aslafy-z force-pushed the patch-3 branch 3 times, most recently from f7423ef to dca0970 Compare May 25, 2024 18:39
@aslafy-z
Copy link
Contributor Author

@todaywasawesome I applied the required change.
@pasha-codefresh I added a sparse documentation for the feature, let me know if it's OK to you.

I also added the argocd app set --sync-retry-refresh flag.

@aslafy-z
Copy link
Contributor Author

I will fix the tests asap

@aslafy-z aslafy-z changed the title feat: abort ongoing operation when a new revision is available (#11494) feat: auto-refresh on new revisions during sync retries (Alpha) (#11494) May 25, 2024
@aslafy-z aslafy-z force-pushed the patch-3 branch 2 times, most recently from 864ce27 to 7129857 Compare May 25, 2024 23:52
@aslafy-z aslafy-z force-pushed the patch-3 branch 2 times, most recently from e1ce49c to a65cbd3 Compare May 26, 2024 00:44
Signed-off-by: Zadkiel AHARONIAN <[email protected]>
@pasha-codefresh
Copy link
Member

@aslafy-z any news?

@pasha-codefresh
Copy link
Member

hey @aslafy-z , are you going to work on it?

@aslafy-z
Copy link
Contributor Author

aslafy-z commented Jul 8, 2024

Hi @pasha-codefresh,

I couldn't reproduce the issue correctly in the end-to-end tests. I'm quite short on time these days. Please feel free to take over if needed, and any suggestions are welcome.

@oleghalin
Copy link

Hi @pasha-codefresh,

I couldn't reproduce the issue correctly in the end-to-end tests. I'm quite short on time these days. Please feel free to take over if needed, and any suggestions are welcome.

I think your test case shouldnt wait for Failed state because patched selector will not be applied and you will stuck at Running state.

error when patching "/dev/shm/919124178": Deployment.apps "guestbook-ui" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"guestbook-ui", "foo":"bar"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

Comment on lines +116 to +125
PatchFile("guestbook-ui-deployment.yaml", `[{"op": "add", "path": "/spec/selector/matchLabels/foo", "value": "bar"}, {"op": "add", "path": "/spec/template/metadata/labels/foo", "value": "bar"}]`).
Refresh(RefreshTypeNormal).
Then().
Expect(OperationPhaseIs(OperationFailed)).
When().
// Trigger refresh again to make sure controller notices previously failed sync attempt before expectation timeout expires
Refresh(RefreshTypeNormal).
Then().
Expect(SyncStatusIs(SyncStatusCodeOutOfSync)).
Expect(Condition(ApplicationConditionSyncError, "Failed sync attempt")).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PatchFile("guestbook-ui-deployment.yaml", `[{"op": "add", "path": "/spec/selector/matchLabels/foo", "value": "bar"}, {"op": "add", "path": "/spec/template/metadata/labels/foo", "value": "bar"}]`).
Refresh(RefreshTypeNormal).
Then().
Expect(OperationPhaseIs(OperationFailed)).
When().
// Trigger refresh again to make sure controller notices previously failed sync attempt before expectation timeout expires
Refresh(RefreshTypeNormal).
Then().
Expect(SyncStatusIs(SyncStatusCodeOutOfSync)).
Expect(Condition(ApplicationConditionSyncError, "Failed sync attempt")).
PatchFile("guestbook-ui-deployment.yaml", `[{"op": "add", "path": "/spec/selector/matchLabels/foo", "value": "bar"}, {"op": "add", "path": "/spec/template/metadata/labels/foo", "value": "bar"}]`).
Refresh(RefreshTypeNormal).
Then().
Expect(OperationPhaseIs(OperationRunning)).
When().

I suppose that you copied this part from previous test and your case doesnt require it.

  • Applying healthy manifest
  • Waiting for Healthy state
  • Patching with corrupted manifest (this case application will stuck in Syncing state. (operationState.phase == 'Running')
  • Expecting phase running
  • Patching with healthy manifest
  • Expecting that introduced feature works as expected (previous sync stopped and new started & application goes to healthy)

Let me know if it makes sense

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aslafy-z I will see if I can give this a try

state.Message = "Operation forced to terminate (new revision available)"
ctrl.setOperationState(app, state)
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a setOperationState below as well, let's combine the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retrying failed sync's block newer commits; how to achieve declarative, level based gitops semantics?
8 participants