Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate remaining graceful recovery tests #2140

Merged

Conversation

bjee19
Copy link
Contributor

@bjee19 bjee19 commented Jun 13, 2024

Automate remaining graceful recovery tests which involve restarting the Node which NGF is running on.

Problem: Need to automate the remaining graceful recovery tests.

Solution: Automate the remaining graceful recovery tests.

Testing: Test works correctly locally and on the pipeline.

Closes #1901

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


@github-actions github-actions bot added the tests Pull requests that update tests label Jun 13, 2024
Copy link

codecov bot commented Jun 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.00%. Comparing base (7654cb6) to head (c22fe8c).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2140      +/-   ##
==========================================
+ Coverage   87.61%   95.00%   +7.38%     
==========================================
  Files          96        1      -95     
  Lines        6695      220    -6475     
  Branches       50       50              
==========================================
- Hits         5866      209    -5657     
+ Misses        773       11     -762     
+ Partials       56        0      -56     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bjee19 bjee19 force-pushed the tests/automate-remaining-graceful-recovery-tests branch from c22fe8c to 330d50b Compare June 17, 2024 19:01
@bjee19
Copy link
Contributor Author

bjee19 commented Jun 17, 2024

Because of the draining and deleting of the node, in addition to the restarting of the kind container. If this test fails/errors mid-process it will cause following tests to fail. If this is not okay I see two options

  1. Make sure this test is run last in the functional test pipeline
  2. Run this test (and maybe just all the graceful recovery tests) in a different pipeline.

Does anyone have any thoughts on this, and if this is even something to worry about?

@bjee19 bjee19 changed the title draft, do not review: Automate remaining graceful recovery tests Automate remaining graceful recovery tests Jun 17, 2024
@bjee19 bjee19 marked this pull request as ready for review June 17, 2024 20:55
@bjee19 bjee19 requested a review from a team as a code owner June 17, 2024 20:55
@sjberman
Copy link
Collaborator

If this test fails/errors mid-process it will cause following tests to fail
@bjee19 Is this an intermittent issue that you see? How do the following tests fail? If there is some condition we could check, maybe we can Skip() any other test in that case?

@bjee19
Copy link
Contributor Author

bjee19 commented Jun 17, 2024

@sjberman

Here are some sample errors I got from following tests when I drain the node, delete the node, but fail on restarting the docker container.

[BeforeEach] /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/sample_test.go:28
  [It] /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/sample_test.go:45

  [FAILED] Expected success, but got an error:
      <*fmt.wrapError | 0xc000103560>: 
      client rate limiter Wait returned an error: context deadline exceeded
      {
          msg: "client rate limiter Wait returned an error: context deadline exceeded",
          err: <context.deadlineExceededError>{},
      }
  In [BeforeEach] at: /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/sample_test.go:37 @ 06/17/24 21:43:50.069

[FAILED] in [BeforeEach] - /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/tracing_test.go:53 @ 06/17/24 21:54:02.502
• [FAILED] [612.532 seconds]
Tracing [BeforeEach] sends tracing spans for one policy attached to one route [functional, tracing]
  [BeforeEach] /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/tracing_test.go:45
  [It] /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/tracing_test.go:165

  [FAILED] Error: INSTALLATION FAILED: context deadline exceeded

So it seems to fail on the setup section because we drain the node and delete it, but if we error there and don't complete it by restarting the container, the previous tests are left without a node. I'm not too sure of what we can do about this to mitigate with our AfterAll in the graceful-recovery test because if we failed on resetting inside the test, I think we would fail to cleanup/reset in the AfterAll.

@sjberman
Copy link
Collaborator

@bjee19 We currently run the telemetry test on its own in the pipeline, maybe we just do something similar with these couple of tests that require draining. It uses different labels than the rest.

@bjee19
Copy link
Contributor Author

bjee19 commented Jun 20, 2024

@sjberman So I added the graceful-recovery test to run on its own in the pipeline. The most recent run is a intentional failure. When this test fails, it no longer runs the rest of the functional tests, is that by design? I think if so then this solves the issue above, as any failure in the graceful recovery test that messes up with the kubernetes node or container will get caught in the graceful recovery test and won't propagate downwards to following tests because it will exit.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 20, 2024
@sjberman
Copy link
Collaborator

@bjee19 Yeah, I think works fine.

tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
@bjee19 bjee19 requested a review from kate-osborn June 21, 2024 18:37
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
@bjee19 bjee19 force-pushed the tests/automate-remaining-graceful-recovery-tests branch from f3f82b9 to ebc4b04 Compare June 24, 2024 20:21
@bjee19 bjee19 requested a review from kate-osborn June 25, 2024 22:32
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Show resolved Hide resolved
tests/suite/graceful_recovery_test.go Outdated Show resolved Hide resolved
@bjee19 bjee19 requested a review from kate-osborn June 26, 2024 17:57
@bjee19 bjee19 requested a review from sjberman June 26, 2024 18:22
@bjee19 bjee19 force-pushed the tests/automate-remaining-graceful-recovery-tests branch from b867bef to 9370b25 Compare June 26, 2024 21:13
@bjee19 bjee19 merged commit aee7a26 into nginx:main Jun 26, 2024
41 checks passed
@bjee19 bjee19 deleted the tests/automate-remaining-graceful-recovery-tests branch June 26, 2024 22:07
miledxz added a commit to miledxz/nginx-gateway-fabric that referenced this pull request Jan 14, 2025
Automate remaining graceful recovery tests which involve restarting the Node which NGF is running on.

Problem: Need to automate the remaining graceful recovery tests.

Solution: Automate the remaining graceful recovery tests.

Testing: Test works correctly locally and on the pipeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation tests Pull requests that update tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate Remaining Graceful-Recovery from Restarts tests
3 participants