Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Fixes #1113: removes fixed max-failure value from task creation #1120

Closed
wants to merge 1 commit into from

Conversation

tjmcs
Copy link
Contributor

@tjmcs tjmcs commented Aug 1, 2016

Fixes #1113

Summary of changes:

  • Modified a single line of code from the core/task.go file that used to set the max-failure value to 10 for all tasks; now it sets the value to whatever was received in the task creation request instead
  • Modified the cmd/snapctl/flags.go and scheduler/task.go files so that they used a single parameter (the DefaultMaxFailures constant from the mgmt/rest package) to set the default value they use for the number of times a task can fail before it is disabled; previously this parameter was set with separate fixed values in these two files and those fixed values did not match (the default was set to 3 in the scheduler/task.go file and to 10 in the cmd/snapctl/flags.go file)

Testing done:

  • started snapd
  • created a task using a copy of the eg-file.yaml task configuration from the Snap distribution; in the copy being used for this part of the test, the max-failures value in the task configuration was set to 3
  • unloaded the underlying mock collector plugin (both versions)
  • verified that the task was disabled after three consecutive failures
  • killed and restarted snapd
  • ran the same test with the max-failures value unspecified in the task configuration file that was used and verified that the task was disabled after 10 consecutive failures (the default value for this parameter once this PR has been applied)
  • ran the same test with the max-failures value set to -1 and verified that the task was once again disabled after 10 consecutive failures (the default value for this parameter)

see the command-line output, below, for details; here's the head of the three YAML files used in the testing (to show the differences in the max-failure parameter value defined in each):

$ head -6 mock-file*
==> mock-file-neg-one-max-failures.yaml <==

---
  version: 1
  schedule:
    type: "simple"
    interval: "1s"
  max-failures: -1

==> mock-file-three-max-faiures.yaml <==

---
  version: 1
  schedule:
    type: "simple"
    interval: "1s"
  max-failures: 3

==> mock-file-unspecified-max-failures.yaml <==

---
  version: 1
  schedule:
    type: "simple"
    interval: "1s"
#  max-failures: -1

$ 

next, we started up snapd in a separate window:

$ SNAP_ADDR=127.0.0.1:12345 snapd -t 0 --config eg-snap-config.yaml 2>&1 | tee t.t
...
...
...

once snapd was up and running, created a task that should be disabled after three concurrent failures, then unloaded the two versions of the collector plugin that the task depends on (so that the task would fail); once the task had been disabled, went back and counted the number of times it failed before snapd disabled that task by grep'ing for the line denoting the task failure in the log output of the snapd instance (the t.t file):

$ snapctl -u "https://127.0.0.1:12345" --insecure task create -t ./mock-file-three-max-faiures.yaml
Using task manifest to create task
Task created
ID: 2fa9ae52-d108-4cae-b7b1-b573e50633e7
Name: Task-2fa9ae52-d108-4cae-b7b1-b573e50633e7
State: Running

$ snapctl -u "https://127.0.0.1:12345" --insecure plugin unload collector:mock:2
Plugin unloaded
Name: mock
Version: 2
Type: collector

$ snapctl -u "https://127.0.0.1:12345" --insecure plugin unload collector:mock:1
Plugin unloaded
Name: mock
Version: 1
Type: collector

$ grep 'Task failed' t.t | wc
       3      51     879

$ 

next, restarted snapd (using the same snapd command shown above) and once it was up and running again re-ran the test, this time using the task configuration file that doesn't specify a value for the max-failures parameter:

$ snapctl -u "https://127.0.0.1:12345" --insecure task create -t ./mock-file-unspecified-max-failures.yaml
Using task manifest to create task
Task created
ID: 5ec3dd4c-8895-48cc-8b7c-19e9763d49db
Name: Task-5ec3dd4c-8895-48cc-8b7c-19e9763d49db
State: Running

$ snapctl -u "https://127.0.0.1:12345" --insecure plugin unload collector:mock:2
Plugin unloaded
Name: mock
Version: 2
Type: collector

$ snapctl -u "https://127.0.0.1:12345" --insecure plugin unload collector:mock:1
Plugin unloaded
Name: mock
Version: 1
Type: collector

$ grep 'Task failed' t.t | wc
      10     170    2941

$ 

and finally, restarted snapd and once it was up and running again ran the test for the task configuration file that specifies a value of -1 for the max-failures parameter:

$ snapctl -u "https://127.0.0.1:12345" --insecure task create -t ./mock-file-neg-one-max-failures.yaml
Using task manifest to create task
Task created
ID: f95046d1-742d-4a5a-b222-50ee20b17867
Name: Task-f95046d1-742d-4a5a-b222-50ee20b17867
State: Running

$ snapctl -u "https://127.0.0.1:12345" --insecure plugin unload collector:mock:2
Plugin unloaded
Name: mock
Version: 2
Type: collector

$ snapctl -u "https://127.0.0.1:12345" --insecure plugin unload collector:mock:1
Plugin unloaded
Name: mock
Version: 1
Type: collector

$ grep 'Task failed' t.t | wc
      10     170    2941

$ 

As you can see, in the first case we saw three lines in the log output indicating that the task had failed before the task was disabled, while for the last two tests we saw ten lines in the log output, showing that the default value is used for the latter two and for any value greater than zero (3 in this case) the number of consecutive failures before the task is disabled is set using the specified max-failure value.

@intelsdi-x/snap-maintainers

@geauxvirtual
Copy link
Contributor

Not sure why the default for a setting in Scheduler for a task is being defined in the REST server.

@tjmcs
Copy link
Contributor Author

tjmcs commented Aug 1, 2016

@geauxvirtual; I really don't care where we define it, as long as we define it once (rather than having it defined three times in three separate packages with two distinctly different values); do you have a preference as to which should be the package it's defined in and which should pull their values from that package?

@tjmcs tjmcs force-pushed the tb/fix-task-disable-process branch from 0ffcdbe to 6a356ab Compare August 1, 2016 23:28
@geauxvirtual
Copy link
Contributor

Honestly, Scheduler should be the only package that needs to know that value and set that value for a task if no value is passed. If a value needs to be passed to create a task, then the REST server or any other package that creates a task should either pass a value for a task or pass something like (-1) to signal the scheduler to use the default value if no value was passed to the API, for example.

@tjmcs
Copy link
Contributor Author

tjmcs commented Aug 1, 2016

@geauxvirtual; after looking at the test failures I was seeing under TravisCI, I decided there was more than one reason to move the definition of that parameter out of the rest package; consequently it is now defined in the scheduler package and all other packages that use this parameter pull their definition from there. LMK if you think that still needs to be changed (or not).

@geauxvirtual
Copy link
Contributor

Neither REST or snapctl should need to import the scheduler package for a single constant. They shouldn't have to worry about passing the default constant to scheduler that already knows what the default value is. These packages should either pass a value that a user passed in, or pass a value that signals scheduler to use the default.

@tjmcs tjmcs force-pushed the tb/fix-task-disable-process branch from 6a356ab to d8f8d66 Compare August 1, 2016 23:48
@tjmcs
Copy link
Contributor Author

tjmcs commented Aug 1, 2016

Yep; had just hit "submit" on my previous comment as yours was appearing in my browser window, @geauxvirtual; this PR is now setup so that the default can be selected by passing a value of -1 to the scheduler if a specific value is not wanted; we could to further by not exporting that constant, but perhaps that is for another day (and other PR)?

@@ -129,7 +125,7 @@ var (
flTaskMaxFailures = cli.IntFlag{
Name: "max-failures",
Usage: "The number of consecutive failures before snap disable the task (Default 10 consective failures)",
Value: DefaultMaxFailures,
Value: -1,
Copy link
Contributor

@IRCody IRCody Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to pass -1, then the usage text could become incorrect if the default changes from 10 consecutive failures to something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably document this somewhere, @IRCody, but from what I understand using a value of -1 is quite common if you want the default value for something to be used. In this case, the default value is 10, and the default value (-1) in snapctl will ensure that the underlying default value from the scheduler package is used (which happens to be 10).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing -1 isn't an issue. Passing -1, while telling the user that the default value is going to be 10 but then not guaranteeing that seems off. Since snapctl does not control the default value I feel like giving a specific value here without enforcing it could potentially mislead the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point...I'll remove the reference to the default value from the CLI usage...Cheers!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR has been updated, @IRCody; here's the new help output for creating tasks:

$ snapctl task create --help
NAME:
   snapctl task create - There are two ways to create a task.
    1) Use a task manifest with [--task-manifest]
    2) Provide a workflow manifest and schedule details.

    * Note: Start and stop date/time are optional.


USAGE:
   snapctl task create [command options] [arguments...]

DESCRIPTION:
   Creates a new task in the snap scheduler

OPTIONS:
   --task-manifest value, -t value  File path for task manifest to use for task creation.
   --workflow-manifest value, -w value  File path for workflow manifest to use for task creation
   --interval value, -i value       Interval for the task schedule [ex (simple schedule): 250ms, 1s, 30m (cron schedule): "0 * * * * *"]
   --start-date value           Start date for the task schedule [defaults to today]
   --start-time value           Start time for the task schedule [defaults to now]
   --stop-date value            Stop date for the task schedule [defaults to today]
   --stop-time value            Start time for the task schedule [defaults to now]
   --name value, -n value       Optional requirement for giving task names
   --duration value, -d value       The amount of time to run the task [appends to start or creates a start time before a stop]
   --no-start               Do not start task on creation [normally started on creation]
   --deadline value         The deadline for the task to be killed after started if the task runs too long (All tasks default to 5s)
   --max-failures value         The number of consecutive failures before snap disable the task (default: -1)


$ 

Note that the reference to that default value has been removed. Hopefully that addresses your concern here.

@IRCody
Copy link
Contributor

IRCody commented Aug 2, 2016

LGTM

@jcooklin
Copy link
Collaborator

jcooklin commented Aug 3, 2016

1__jrcookli_ubuntu_____gvm_pkgsets_go1_6_2_snap_src_github_com_intelsdi-x_snap__ssh_

With the description we are providing I think -1 as a default value for --max-fialures is confusing. Also, I would expect that -1 would make it so that no number of failures would cause a task to go into a disabled state (which is something we should do IMO).

@tjmcs tjmcs force-pushed the tb/fix-task-disable-process branch 4 times, most recently from fd49f24 to 79814ab Compare August 4, 2016 17:29
@geauxvirtual
Copy link
Contributor

This PR is now doing more than resolving issue #1113. Any enhancements to snapctl should be added via another PR.

@tjmcs
Copy link
Contributor Author

tjmcs commented Aug 15, 2016

PR #1127 replaces this PR; closing this PR now.

@tjmcs tjmcs closed this Aug 15, 2016
@tjmcs tjmcs deleted the tb/fix-task-disable-process branch August 15, 2016 23:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task disabled when max-failures set to -1
4 participants