Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: support denylist warn or warning feature #3344

Closed
dustymabe opened this issue Feb 6, 2023 · 6 comments
Closed

kola: support denylist warn or warning feature #3344

dustymabe opened this issue Feb 6, 2023 · 6 comments
Assignees

Comments

@dustymabe
Copy link
Member

The idea here is that we still support running the tests but bubble them up as warnings rather than hard failures with something like a warn: true knob in the denylist.

The test would still run but the overall process wouldn't error (or maybe it does error with a specific exit code; TBD) and then caller would decide how it wanted to handle the situation.

We could also combine this with snooze. So a snoozed test wouldn't run until $date and then when it did start running again it would be a warning, not an error.

@nikita-dubrovskii nikita-dubrovskii self-assigned this May 10, 2023
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 18, 2023
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 18, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⚠️  Snoozing expired on: Jul 07 2023. Warning kola test pattern: "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
🕒 Snoozing kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation" until Jul 20 2023:
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.chrony.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.02s)
--- FAIL: ext.config.ntp.chrony.dhcp-propagation (107.62s)
--- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (118.02s)
FAIL, output in tmp/kola
--- ⚠️ IGNORE: "ext.config.ntp.chrony.dhcp-propagation"
--- ⚠️ IGNORE: "ext.config.ntp.timesyncd.dhcp-propagation"
+ rc=0
+ set +x
```
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 18, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⚠️  Snoozing expired on: Jul 07 2023. Warning kola test pattern: "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern: "ext.config.ntp.timesyncd.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.chrony.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.02s)
--- FAIL: ext.config.ntp.chrony.dhcp-propagation (107.62s)
--- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (118.02s)
FAIL, output in tmp/kola
--- ⚠️ IGNORE: "ext.config.ntp.chrony.dhcp-propagation"
--- ⚠️ IGNORE: "ext.config.ntp.timesyncd.dhcp-propagation"
+ rc=0
+ set +x
```
@cgwalters
Copy link
Member

It looks like coreos/fedora-coreos-config#2512 is going to add this field to every test. If that's the case...then why don't we just make warn: true the default, and have warn: false to go back to the prior behavior of not running the test at all?

Are there cases where we want warn: false even? Maybe, I'm not sure.

But it seems strange to me to just add a new thing to copy-paste into every denylisted test...

nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 19, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⚠️  Snoozing expired on: Jul 07 2023. Warning kola test pattern: "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern: "ext.config.ntp.timesyncd.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.chrony.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.02s)
--- FAIL: ext.config.ntp.chrony.dhcp-propagation (107.62s)
--- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (118.02s)
FAIL, output in tmp/kola
--- ⚠️ IGNORE: "ext.config.ntp.chrony.dhcp-propagation"
--- ⚠️ IGNORE: "ext.config.ntp.timesyncd.dhcp-propagation"
+ rc=0
+ set +x
```
@nikita-dubrovskii
Copy link
Contributor

@cgwalters

why don't we just make warn: true the default

Agree, updated PRs and set it to true by default.

@dustymabe
Copy link
Member Author

It looks like coreos/fedora-coreos-config#2512 is going to add this field to every test. If that's the case...then why don't we just make warn: true the default, and have warn: false to go back to the prior behavior of not running the test at all?

IMO the behavior should be:

- pattern: fcos.internet
  tracker: https://github.com/coreos/coreos-assembler/pull/1478

behaves as it does today where it denylists the test and the test does not run.

- pattern: fcos.internet
  warn: true
  tracker: https://github.com/coreos/coreos-assembler/pull/1478

would make it so that the test runs but a test failure isn't fatal.

And this:

- pattern: fcos.internet
  warn: false
  tracker: https://github.com/coreos/coreos-assembler/pull/1478

Doesn't really make a lot of sense IMO.

Are there cases where we want warn: false even? Maybe, I'm not sure.

I don't think so (see above), but I don't know a super clear way to indicate that warn: false doesn't make sense.

But it seems strange to me to just add a new thing to copy-paste into every denylisted test...

I agree. We should only add warn: true to tests that we want to run and not have an error be fatal.

@dustymabe
Copy link
Member Author

one further use case to flesh out: something like:

- pattern: fcos.internet
  snooze: 2023-08-20
  warn: true
  tracker: https://github.com/coreos/coreos-assembler/pull/1478

would NOT run the test until Aug 20th, and then when the test did start running again it would only warn. This makes dealing with snoozes easier (i.e. when the time expires our pipelines won't start failing, just warning).

nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 20, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⚠️  Warning kola test pattern "ext.config.ntp.chrony.dhcp-propagation", snoozing expired on Jul 07 2023:
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.chrony.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.02s)
--- FAIL: ext.config.ntp.chrony.dhcp-propagation (107.62s)
--- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (118.02s)
FAIL, output in tmp/kola
--- ⚠️ IGNORE: "ext.config.ntp.chrony.dhcp-propagation"
--- ⚠️ IGNORE: "ext.config.ntp.timesyncd.dhcp-propagation"
+ rc=0
+ set +x
```
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 24, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⚠️  Warning kola test pattern "ext.config.ntp.chrony.dhcp-propagation", snoozing expired on Jul 07 2023:
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.chrony.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.02s)
--- FAIL: ext.config.ntp.chrony.dhcp-propagation (107.62s)
--- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (118.02s)
FAIL, output in tmp/kola
--- ⚠️ IGNORE: "ext.config.ntp.chrony.dhcp-propagation"
--- ⚠️ IGNORE: "ext.config.ntp.timesyncd.dhcp-propagation"
+ rc=0
+ set +x
```

Co-authored-by: Dusty Mabe <[email protected]>
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 27, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⏭️  Skipping kola test pattern "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation", snoozing expired on Jul 20 2023:
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.56s)
--- WARN: ext.config.ntp.timesyncd.dhcp-propagation (90.72s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2023-07-27T06:24:18Z cli: Unit kola-runext.service exited with code 1
        harness.go:1236: kolet failed: : kolet run-test-unit failed: Process exited with status 1
FAIL, output in tmp/kola
+ rc=0
+ set +x
```

Co-authored-by: Dusty Mabe <[email protected]>
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 28, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⏭️  Skipping kola test pattern "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation", snoozing expired on Jul 20 2023:
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.56s)
--- WARN: ext.config.ntp.timesyncd.dhcp-propagation (90.72s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2023-07-27T06:24:18Z cli: Unit kola-runext.service exited with code 1
        harness.go:1236: kolet failed: : kolet run-test-unit failed: Process exited with status 1
FAIL, output in tmp/kola
+ rc=0
+ set +x
```

Co-authored-by: Dusty Mabe <[email protected]>
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-assembler that referenced this issue Jul 28, 2023
coreos#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⏭️  Skipping kola test pattern "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation", snoozing expired on Jul 20 2023:
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.56s)
--- WARN: ext.config.ntp.timesyncd.dhcp-propagation (90.72s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2023-07-27T06:24:18Z cli: Unit kola-runext.service exited with code 1
        harness.go:1236: kolet failed: : kolet run-test-unit failed: Process exited with status 1
FAIL, output in tmp/kola
+ rc=0
+ set +x
```

Co-authored-by: Dusty Mabe <[email protected]>
dustymabe added a commit that referenced this issue Jul 29, 2023
#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⏭️  Skipping kola test pattern "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation", snoozing expired on Jul 20 2023:
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.56s)
--- WARN: ext.config.ntp.timesyncd.dhcp-propagation (90.72s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2023-07-27T06:24:18Z cli: Unit kola-runext.service exited with code 1
        harness.go:1236: kolet failed: : kolet run-test-unit failed: Process exited with status 1
FAIL, output in tmp/kola
+ rc=0
+ set +x
```

Co-authored-by: Dusty Mabe <[email protected]>
travier pushed a commit that referenced this issue Aug 17, 2023
#3344

We bubble denylisted tests with 'Warn: true' option as warnings rather than hard failures:
```
kola -p qemu run --parallel 8 ext.config.ntp.* --output-dir tmp/kola

⏭️  Skipping kola test pattern "ext.config.ntp.chrony.dhcp-propagation":
  👉 coreos/fedora-coreos-tracker#1508
⚠️  Warning kola test pattern "ext.config.ntp.timesyncd.dhcp-propagation", snoozing expired on Jul 20 2023:
  👉 coreos/fedora-coreos-tracker#1508
=== RUN   ext.config.ntp.chrony.coreos-platform-chrony-config
=== RUN   ext.config.ntp.timesyncd.dhcp-propagation
--- PASS: ext.config.ntp.chrony.coreos-platform-chrony-config (27.56s)
--- WARN: ext.config.ntp.timesyncd.dhcp-propagation (90.72s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2023-07-27T06:24:18Z cli: Unit kola-runext.service exited with code 1
        harness.go:1236: kolet failed: : kolet run-test-unit failed: Process exited with status 1
FAIL, output in tmp/kola
+ rc=0
+ set +x
```

Co-authored-by: Dusty Mabe <[email protected]>
@jlebon
Copy link
Member

jlebon commented Oct 12, 2023

Isn't this fixed by #3539?

@dustymabe
Copy link
Member Author

Yes. I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants