Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAKE_POOLS_GARBAGE_COLLECTION_01 timed out #2337

Closed
jonathanknowles opened this issue Nov 19, 2020 · 3 comments
Closed

STAKE_POOLS_GARBAGE_COLLECTION_01 timed out #2337

jonathanknowles opened this issue Nov 19, 2020 · 3 comments
Labels
Test failure A flaky test or nightly CI failure

Comments

@jonathanknowles
Copy link
Member

jonathanknowles commented Nov 19, 2020

Context

Test Case

STAKE_POOLS_GARBAGE_COLLECTION_01

Failure / Counter-example

Test timed out during the first stage of the pool garbage collection integration test.

    STAKE_POOLS_LIST_01 - List stake pools
      has non-zero saturation & stake
      pools have the correct retirement information
      eventually has correct margin, cost and pledge
      at least one pool eventually produces block
      contains pool metadata
      contains and is sorted by non-myopic-rewards
      non-myopic-rewards are based on stake
    STAKE_POOLS_LIST_05 - Fails without query parameter
    STAKE_POOLS_LIST_06 - NonMyopicMemberRewards are 0 when stake is 0
      # PENDING: This assumption seems false, for some reasons...
    STAKE_POOLS_GARBAGE_COLLECTION_01 - retired pools are garbage collected on schedule and not before

This could be because previous stages took too long to complete, or it could be because this stage itself timed out. Further investigation is required.

Resolution


QA

@jonathanknowles jonathanknowles added the Test failure A flaky test or nightly CI failure label Nov 19, 2020
iohk-bors bot added a commit that referenced this issue Nov 23, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 23, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 23, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 23, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 23, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 24, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 24, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 24, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=jonathanknowles a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 24, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=jonathanknowles a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 24, 2020
2338: Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending r=jonathanknowles a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
@Anviking
Copy link
Member

Interestingly, in #2338 (comment), the STAKE_POOLS_GARBAGE_COLLECTION_01 test was marked pending, but there was still a similar-looking timeout.

This would suggest that STAKE_POOLS_GARBAGE_COLLECTION_01 itself is never the cause for the timeout.

Another peculiarity is that our custom it function should have a timeout of 10 min to prevent the entire integration tests from timing out. Somehow the entire integration tests are still timing out.

@Anviking
Copy link
Member

In the wallet logs from #2338 (comment), we see that the final hour is spent making no other requests but /v2/stake-pools:

[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:07.88 UTC] [RequestId 8097] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:09.27 UTC] [RequestId 8097] 200 OK in 1.396137382s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:09.82 UTC] [RequestId 8098] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:09.84 UTC] [RequestId 8098] 200 OK in 0.022376344s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:10.34 UTC] [RequestId 8099] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:10.35 UTC] [RequestId 8099] 200 OK in 0.010860716s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:11.26 UTC] [RequestId 8100] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Warning:24920] [2020-11-24 10:43:11.40 UTC] [RequestId 8100] 503 Service Unavailable in 0.141040035s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:12.25 UTC] [RequestId 8101] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Warning:24920] [2020-11-24 10:43:13.83 UTC] [RequestId 8101] 503 Service Unavailable in 1.582193602s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:14.92 UTC] [RequestId 8102] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:15.50 UTC] [RequestId 8102] 200 OK in 0.57860712s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:16.03 UTC] [RequestId 8103] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Warning:24920] [2020-11-24 10:43:16.04 UTC] [RequestId 8103] 503 Service Unavailable in 0.006878821s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:16.57 UTC] [RequestId 8104] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:18.10 UTC] [RequestId 8104] 200 OK in 1.532958417s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:18.83 UTC] [RequestId 8105] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:19.86 UTC] [RequestId 8105] 200 OK in 1.028426798s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:20.46 UTC] [RequestId 8106] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:20.49 UTC] [RequestId 8106] 200 OK in 0.033413222s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:21.18 UTC] [RequestId 8107] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:22.48 UTC] [RequestId 8107] 200 OK in 1.292145121s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:23.49 UTC] [RequestId 8108] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Warning:24920] [2020-11-24 10:43:23.86 UTC] [RequestId 8108] 503 Service Unavailable in 0.366077735s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:24.43 UTC] [RequestId 8109] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:24.44 UTC] [RequestId 8109] 200 OK in 0.011332023s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:25.22 UTC] [RequestId 8110] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:27.11 UTC] [RequestId 8110] 200 OK in 1.890200871s
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:27.86 UTC] [RequestId 8111] [GET] /v2/stake-pools?stake=10000000000000000
[cardano-wallet.api-server:Info:24920] [2020-11-24 10:43:29.29 UTC] [RequestId 8111] 200 OK in 1.431110306s

@Anviking
Copy link
Member

Some more messier notes:

  • I am seeing an alarming amount of AcquireFailurePointNotOnChain
    • Reducing parallelism doesn't seem to help
    • Mostly for TipSyncClient, but also for delegation client
    • Does increasing slot-length help?
    • I saw it get stuck around stake pool tests. And printing AcquireFailurePointNotOnChain. Not sure why.
    • Seems acquire failures start when entering the eventually "Wallet gets rewards" part *
  • Does disabling the retrying it help?
  • I guess the problem is that the original delegation tx is rolled back. Or the faucet tx.

*)

gistered on chain... Please wait 60s until active... Can be skipped using NO_POOLS=1.
[cardano-wallet.integration:Notice:416] [2020-11-24 15:55:23.85 UTC] http://127.0.0.1:62818/
### Setup
### List pools
### Join pools
### Waiting...
### Eventually gets rewards...
[cardano-wallet.network:Error:453] [2020-11-24 15:58:05.21 UTC] Error when querying local state parameters for TipSyncClient: AcquireFailurePointNotOnChain
[cardano-wallet.network:Error:453] [2020-11-24 15:58:05.21 UTC] Error when querying local state parameters for TipSyncClient: AcquireFailurePointNotOnChain
[cardano-wallet.network:Error:453] [2020-11-24 15:58:05.22 UTC] Error when querying local state parameters for TipSyncClient: AcquireFailurePointNotOnChain
[cardano-wallet.network:Error:422] [2020-11-24 15:58:11.21 UTC] Error when querying local state parameters for DelegationRewardsClient: AcquireFailurePointNotOnChain
[cardano-wallet.network:Error:453] [2020-11-24 15:58:11.22 UTC] Error when querying local state parameters for TipSyncClient: AcquireFailurePointNotOnChain
[cardano-wallet.network:Error:453] [2020-11-24 15:58:11.22 UTC] Error when querying local state parameters for TipSyncClient: AcquireFailurePointNotOnChain
[cardano-wallet.network:Error:453] [2020-11-24 15:58:11.22 UTC] Error when querying local state parameters for TipSyncClient: AcquireFailurePointNotOnChain

@Anviking Anviking self-assigned this Nov 25, 2020
iohk-bors bot added a commit that referenced this issue Nov 25, 2020
2338: Mark TRANS_TTL_{01,02}, STAKE_POOLS_JOIN_05, and STAKE_POOLS_SMASH_01 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Also mark STAKE_POOLS_SMASH_01 pending
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
@Anviking Anviking removed their assignment Nov 25, 2020
iohk-bors bot added a commit that referenced this issue Nov 25, 2020
2338: Mark TRANS_TTL_{01,02}, STAKE_POOLS_JOIN_05, and STAKE_POOLS_SMASH_01 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Also mark STAKE_POOLS_SMASH_01 pending
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


2346: get rid of 'OnDanglingChange' option for fee balancing r=KtorZ a=KtorZ

# Issue Number

<!-- Put here a reference to the issue this PR relates to and which requirements it tackles -->

ADP-568

# Overview

<!-- Detail in a few bullet points the work accomplished in this PR -->

- [ ] I have removed 'OnDanglingChange' option for fee balancing.

# Comments

<!-- Additional comments or screenshots to attach if any -->

  This option allowed choosing between two modes: SaveMoney and
  PayAndBalance. The former was used with cardano-node, and the latter
  used with jormungandr. The reason for having a difference is was
  because of discrepency in the minimum transaction expected by both
  ledger. On jormungandr, transactions have to be _exactly_ balanced and
  leave exactly the expected fees required by the network. On the
  counterpart, the fee calculation was only a function of the number of
  inputs and outputs... therefore much easier to satisfy than on
  cardano-node. Now that we've removed jormungandr, this extra
  indirection / complexity is just harmful. Since the 'PayAndBalance'
  mode is never used, I've removed the option entirely and made code
  assume 'SaveMoney' everywhere it used to choose between both
  alternatives.

  This also seemingly remove the 'allowUnbalancedTx' field from the
  transaction layer which was directly related to this option.

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
Co-authored-by: KtorZ <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 25, 2020
2338: Mark TRANS_TTL_{01,02}, STAKE_POOLS_JOIN_05, and STAKE_POOLS_SMASH_01 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Also mark STAKE_POOLS_SMASH_01 pending
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Nov 25, 2020
2338: Mark TRANS_TTL_{01,02}, STAKE_POOLS_JOIN_05, and STAKE_POOLS_SMASH_01 pending r=Anviking a=Anviking

# Issue Number

None. Addressing CI failures.

# Overview


- [x] Add new `flakyBecauseOf ticketOrReason` helper that calls `pendingWith` unless `RUN_FLAKY_TESTS` is set.
- [x] Mark TRANS_TTL_{01,02} and STAKE_POOLS_JOIN_05 pending/flaky.
- [x] Also mark STAKE_POOLS_SMASH_01 pending
- [x] Add manual test calling for running flaky tests

# Comments

- Should lower the failure rate by 21% of runs, from 59% to 38%. 
- Next candidate for marking pending would be #2224, but with a relatively low failure rate of 3.6%, and being important, I think it would be a bad idea.
- Maybe we should have flaky tests run per default, unless setting `DONT_RUN_FLAKY_TESTS` in CI, to maximise the times we run them locally.

Recent bors failures:
```
succeded: 19, failed: 37 (66%), total: 56
excluding #expected failures

Broken down by tags/issues:
10 times #2292 Flaky test - various DB properties causing timeout | #2292
7 times #2295 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
6 times
3 times #2311 Flaky test - integration test timeout after/related to STAKE_POOLS_LIST_01 | #2311
3 times #2230 Flaky  STAKE_POOLS_JOIN_05 - Can join when stake key already exists | #2230
2 times #2224 Flaky STAKE_POOLS_LIST_01 - List stake pools, has non-zero saturation & stake | #2224
1 times #another-integration-timeout  |
1 times #2337 STAKE_POOLS_GARBAGE_COLLECTION_01 timed out | #2337
1 times #2320 Flaky test - The node backend is unreachable at the moment. STAKE_POOLS_QUIT_02 | #2320
1 times #2295, #2331 Flaky TRANS_TTL_{01,02} - SlotNo 80 > SlotNo 50 | #2295
1 times #2207 Flaky SHELLEY_MIGRATE_01_big_wallet | #2207
1 times #2118 Property `prop_rebalanceSelection` occasionally fails. | #2118
```


<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: Johannes Lund <[email protected]>
@KtorZ KtorZ closed this as completed Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Test failure A flaky test or nightly CI failure
Projects
None yet
Development

No branches or pull requests

3 participants