-
Notifications
You must be signed in to change notification settings - Fork 109
(5.5) Background plan resume / plan tail #1899
Conversation
} | ||
diff, err := DiffPlan(*previousPlan, *newPlan) | ||
if err != nil { | ||
logrus.WithError(err).Error("Failed to diff plans.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder what this error would mean to the user? Also, what could it mean to the loop itself - will it be able to terminate if this continues? Shouldn't diffing always succeed (w/o looking at the implementation of DiffPlan) - i.e. either return a diff or no diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little tricky, the diff should always succeed, yes, as long as getPlan()
always returns the same plan (which it should). Otherwise this would likely mean a programming error on our part, so not much a user can do.
return trace.Wrap(err) | ||
} | ||
// Make sure to launch the unit command with the --block flag. | ||
args := append(os.Args[1:], "--debug", "--block") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's susceptible to the same issue. In the cases you're referring to we had to either replace the flags (remove/add), or update their values. In this case we just need to append a --block
flag. Worst case scenario, it'll end up with two --debug
flags which is fine. So unless I'm missing anything, I'd like to keep this simple.
|
||
// launchOneshotService launches the specified command as a one-shot systemd | ||
// service with the specified name. | ||
func launchOneshotService(name string, args []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how that differs from https://github.com/gravitational/gravity/blob/version/5.5.x/tool/gravity/cli/system.go#L313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have a couple of similar methods which I originally tried to see if they will fit this use-case, but they're different in a couple of ways. This method, for example, makes sure there's no another service running to prevent resume from being launched multiple times simultaneously.
if err != nil { | ||
return trace.Wrap(err) | ||
func displayExpandOperationPlan(localEnv *localenv.LocalEnvironment, environ LocalEnvironmentFactory, opKey ops.SiteOperationKey, opts displayPlanOptions) error { | ||
return outputOrFollowPlan(localEnv, getExpandOperationPlanFunc(environ, opKey), opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could move getXXXPlanFunc inside of the respective displayXXXPlan variant since they're no used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how I did it originally, but it was a bit messy. I also think having those method in global scope might be a bit more convenient if something/someone ever needs to use them.
@@ -139,6 +140,64 @@ func ResolvePlan(plan storage.OperationPlan, changelog storage.PlanChangelog) *s | |||
return &plan | |||
} | |||
|
|||
// DiffPlan returns the difference between the previous and the next plans in the | |||
// form of a changelog. | |||
func DiffPlan(prevPlan *storage.OperationPlan, nextPlan storage.OperationPlan) (diff []storage.PlanChange, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pass both plans as pointers. Or is there a reason for passing nextPlan as a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's actually on purpose. To indicate that the prevPlan
is optional (nil can be passed), while the nextPlan
should always be passed in.
@@ -202,6 +202,10 @@ func outputOrFollowPlan(localEnv *localenv.LocalEnvironment, getPlan fsm.GetPlan | |||
} | |||
|
|||
func followPlan(localEnv *localenv.LocalEnvironment, getPlan fsm.GetPlanFunc) error { | |||
plan, err := getPlan() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this will now make the follow command half-resilient - before it could tolerate initial transient failure and now will fail immediately forcing the user to retry. Not sure which is better though.
Description
This pull requests implements a couple of resiliency improvements for the
plan resume
command. The main issue was that users would often launch it and lose SSH connection (esp. relevant if launching from our web UI's terminal) which would basically terminate the resume since it executes in foreground.gravity plan resume
to launch resume from a one-shot systemd service by default.--block
flag for the old behavior to run in foreground.gravity plan --tail
to be able to monitor plan progress.Type of change
Linked tickets and other PRs
TODOs
Testing done
Resume in background (now default)
Resume blocking (old behavior)