Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify Reaper state to create new or return existing instance #782

Merged

Conversation

mdonkers
Copy link
Contributor

What does this PR do?

Resolves #764 and also makes #258 (PR) obsolete.

As the issue describes, the Reaper container might exit if there's no connection for a longer time. For example when running multiple integration tests, some not using TestContainers. When that happens the next test will fail as Reaper is created as singleton but no connection can be created.

Why is it important?

Have reliable tests

Related issues

@mdonkers mdonkers requested a review from a team as a code owner January 25, 2023 16:49
@netlify
Copy link

netlify bot commented Jan 25, 2023

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit fb64671
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/63d8d81b169a4a0007561126
😎 Deploy Preview https://deploy-preview-782--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@mdelapenya mdelapenya added the bug An issue with the library label Jan 25, 2023
@mdelapenya mdelapenya self-assigned this Jan 25, 2023
@mdonkers
Copy link
Contributor Author

@mdelapenya this is the PR for better Reaper consistency.

I opted to use the Container.State() for checking if running, instead of using a connection, because it seems nicer using Container functions. The caveat is in some scenarios, there is a very slight chance the container exits between checking its state and running the Connect() function.

@mdonkers mdonkers force-pushed the 764-reaper-recreate-on-failure-or-exit branch from 409d431 to 02a5664 Compare January 25, 2023 17:01
@mdonkers
Copy link
Contributor Author

@mdelapenya it seems like this PR is awaiting approval to run tests (again?). Maybe because I fixed a linting error and then rebased and pushed again?

reaper_test.go Outdated Show resolved Hide resolved
reaper.go Outdated Show resolved Hide resolved
@mdelapenya
Copy link
Member

Sorry for the merge conflicts 😞 , we needed to extract the calculation of the docker host to an internal package as part of certain refactors (#796). Do you want me to resolve them?

Miel Donkers added 2 commits January 31, 2023 09:39
…n-failure-or-exit

* origin/main:
  docs: add intel as user (testcontainers#798)
  chore: bump containerd in examples (testcontainers#797)
  chore(deps): bump github.com/containerd/containerd from 1.6.15 to 1.6.16 (testcontainers#793)
  chore: extract docker host calculation to an internal package (testcontainers#796)
  chore: run "go mod tidy" automatically when creating examples (testcontainers#794)
  chore: build images with backoff retries (testcontainers#792)
@mdonkers
Copy link
Contributor Author

Sorry for the merge conflicts disappointed , we needed to extract the calculation of the docker host to an internal package as part of certain refactors (#796). Do you want me to resolve them?

Conflicts resolved. Wasn't that bad luckily :-)

// If reaper already exists and healthy, re-use it
if reaperInstance != nil {
// Verify this instance is still running by checking state.
// Can't use Container.IsRunning because the bool is not updated when Reaper is terminated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile the CI runs (tests will pass) can you elaborate a bit on this comment? Do you think that bool should be updated in consequence (probably here? in a follow-up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the Container.IsRunning is simply a boolean flag (

isRunning bool
) that gets updated when a container is stopped / terminated (
c.isRunning = false
). And while the Reaper is a 'regular' container, it is never terminated via the Stop() or Terminate() methods. Instead, it terminates automatically by itself by closing the connection to it (
case c.terminationSignal <- true:
).
Which makes the IsRunning() unreliable for the Reaper container as it would never return false.

That's why I chose to poll Docker and get the state from there, as it should be reliable.

What I think would be better, is if the IsRunning() method (

func (c *DockerContainer) IsRunning() bool {
) gets updated to what I'm doing below; checking from Docker if the state is running or not. That would make it reliable and not depend on a boolean flag internally updated.
I did not want to touch this code in the PR because I don't know the history behind it. Perhaps IsRunning is regularly checked and would be relatively heavy on the Docker calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for such a clear explanation, now I see what you mean.

I think checking the state from Docker would be more accurate than the one we have now, and I understand your concerns about performance if that method is called many times. I've checked their usages and are just a few of them at this moment (particularly worried on the GenericContainer function calling it).

Adding that code, we could also take the situation to a place where multiple goroutines in tests are trying to read the container state at the same time, and one decides to see it as not running, and the next one as running, which could be worse.

I'd keep this simple as in your submission, and evaluate if we need to check for the state more frequently.

Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shaun approves 😎

image

This comment does not affect the review, so will eventually merge this one today

Thank you very much for your support!

@mdelapenya
Copy link
Member

Mmm I've checkout out the PR locally to verify one weird behaviour that I'm seeing in another issue, which happens when there are multiple packages in the Go project running the tests. How go test works is that it runs each package separately, creating as many Ryuk containers as packages in the execution. I've verified that even with this PR, there are multiple Ryuk containers. Let me do some verifications on accessing the mutex and will let you know asap

@mdelapenya
Copy link
Member

I've verified that even with this PR, there are multiple Ryuk containers. Let me do some verifications on accessing the mutex and will let you know asap

This is the expected behaviour of go test ./...: creates a test binary for each package, therefore no synchronisation across them will happen and, effectively, as many reaper instances will be running for each package in the test execution.

Said that, if you agree, we can merge this one, as it covers the scenario of running just one reaper instance per test binary.

@mdonkers
Copy link
Contributor Author

All good from my side. Would be happy to have it merged.

@mdelapenya mdelapenya merged commit 10cdf3a into testcontainers:main Jan 31, 2023
@mdonkers mdonkers deleted the 764-reaper-recreate-on-failure-or-exit branch January 31, 2023 14:32
mdelapenya added a commit to mdelapenya/testcontainers-go that referenced this pull request Feb 10, 2023
* main:
  chore: update Docker labels for containers (testcontainers#813)
  fix: nil pointer dereference in HealthStrategy (testcontainers#802)
  fix: Synchronise writes to containers map (testcontainers#812)
  chore(deps): bump google.golang.org/api from 0.108.0 to 0.109.0 in /examples (testcontainers#810)
  chore(deps): bump cloud.google.com/go/spanner in /examples/spanner (testcontainers#806)
  chore: restructure Docker helper methods (testcontainers#799)
  Verify Reaper state to create new or return existing instance (testcontainers#782)
  docs: add intel as user (testcontainers#798)
  chore: bump containerd in examples (testcontainers#797)
  chore(deps): bump github.com/containerd/containerd from 1.6.15 to 1.6.16 (testcontainers#793)
  chore: extract docker host calculation to an internal package (testcontainers#796)
  chore: run "go mod tidy" automatically when creating examples (testcontainers#794)
  chore: build images with backoff retries (testcontainers#792)
  fix: use right import package for compose in docs (testcontainers#791)
  chore(deps): bump google.golang.org/grpc from 1.52.1 to 1.52.3 in /examples (testcontainers#790)
  Add devcontainer file (testcontainers#765)
  chore: check dependabot dependencies weekly (testcontainers#789)
  chore(deps): bump google.golang.org/grpc from 1.52.0 to 1.52.1 in /examples (testcontainers#783)
  chore: support for titles in examples (testcontainers#775)
@emetsger
Copy link
Contributor

Mmm I've checkout out the PR locally to verify one weird behaviour that I'm seeing in another issue, which happens when there are multiple packages in the Go project running the tests.

@mdelapenya I believe this is because the reaper container can be exposed by the Docker API before the reaper is ready to accept connections.

I've opened the following PRs to address this:

  1. Add a HEALTHCHECK to the reaper Docker image (Add HEALTHCHECK to Linux Docker image. moby-ryuk#128)
  2. Update testcontainers-go to consider the health status of the reaper container before returning it (Fix race condition when looking up reaper (ryuk) container #2508)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Ryuk reaper container not properly re-initialized after it got terminated
3 participants