Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Wait for EDP simulator to initialize before considering it ready #1732

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

SanjayVas
Copy link
Member

This introduces a health check to the EDP simulator, where it is considered healthy only after its initialization items (e.g. updating capabilities). This is used by a Kubernetes startup probe.

Fixes #1728

@SanjayVas SanjayVas requested a review from ple13 August 2, 2024 17:27
@wfa-reviewable
Copy link

This change is Reviewable

@SanjayVas SanjayVas force-pushed the sanjayvas-startup-health branch 4 times, most recently from b1e3aba to c57bd54 Compare August 2, 2024 18:16
Copy link
Contributor

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that EDPs can declare healthy with or without a health file, but a file is needed for Kubernetes to do the probe, is it right? In the configuration, it looks like the probe command looks for a fixed filename, should we instantiate it with the provided health-file?

Reviewed 7 of 11 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


src/main/k8s/edp_simulator.cue line 87 at r2 (raw file):

					args: ["/bin/sh", "-c", "while true; do sleep 30; done"]
					startupProbe: {
						exec: command: ["cat", "/run/probe/healthy"]

Is it possible not to hard-code the file name?

@SanjayVas SanjayVas force-pushed the sanjayvas-startup-health branch from c57bd54 to aa4a304 Compare August 2, 2024 19:57
@SanjayVas SanjayVas requested a review from ple13 August 2, 2024 19:58
Copy link
Member Author

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do instantiate it with the value of the --health-file option if specified.

Reviewable status: 10 of 11 files reviewed, 1 unresolved discussion (waiting on @ple13)


src/main/k8s/edp_simulator.cue line 87 at r2 (raw file):

Previously, ple13 (Phi) wrote…

Is it possible not to hard-code the file name?

This isn't hard-coding, as it's in the K8s config rather than the compiled code. It's passed to the simulator via a command line option. The key issues are that:

  1. The file path must be the same for both the --health-file option and the probe.
  2. The file must be in a volume mount that's shared by both the main container and the sidecar. By default we create emptyDir volume mounts at /run/<name>.

I extracted a HealthFile CUE alias to make (1) more clear.

Copy link
Contributor

@ple13 ple13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 11 files at r1, 3 of 4 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt line 269 at r3 (raw file):

        .also {
          try {
            eventGroupsClient.waitForEventGroups(

thx for cleaning this up

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @SanjayVas)


src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt line 498 at r3 (raw file):

    val chainedRule = chainRulesSequentially(tempDir, Images(), measurementSystem)

    private suspend fun EventGroupsGrpcKt.EventGroupsCoroutineStub.waitForEventGroups(

was looking at this a same time. How does this update ensure that the event groups were created before proceeding?

Copy link
Member Author

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @stevenwarejones)


src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt line 269 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

thx for cleaning this up

Done.


src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt line 498 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

was looking at this a same time. How does this update ensure that the event groups were created before proceeding?

We already wait for deployments to be ready (see waitUntilDeploymentsComplete). In the case of EDP simulator deployments, the EdpSimulatorRunner startup looks like this:

  1. Ensure event groups
  2. Update data availability
  3. Update capabilities
  4. Set healthy to true
  5. Poll for Requisitions

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)


src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt line 498 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

We already wait for deployments to be ready (see waitUntilDeploymentsComplete). In the case of EDP simulator deployments, the EdpSimulatorRunner startup looks like this:

  1. Ensure event groups
  2. Update data availability
  3. Update capabilities
  4. Set healthy to true
  5. Poll for Requisitions

thx for the clarification

This introduces a health check to the EDP simulator, where it is considered healthy only after its initialization items (e.g. updating capabilities). This is used by a Kubernetes startup probe.
@SanjayVas SanjayVas force-pushed the sanjayvas-startup-health branch from aa4a304 to 67e2b7b Compare August 6, 2024 15:25
@SanjayVas SanjayVas enabled auto-merge (squash) August 6, 2024 15:26
@SanjayVas SanjayVas merged commit 28bda6a into main Aug 6, 2024
4 checks passed
@SanjayVas SanjayVas deleted the sanjayvas-startup-health branch August 6, 2024 15:59
renjiezh pushed a commit that referenced this pull request Aug 7, 2024
…#1732)

This introduces a health check to the EDP simulator, where it is considered healthy only after its initialization items (e.g. updating capabilities). This is used by a Kubernetes startup probe.

Fixes #1728
ple13 pushed a commit that referenced this pull request Aug 16, 2024
…#1732)

This introduces a health check to the EDP simulator, where it is considered healthy only after its initialization items (e.g. updating capabilities). This is used by a Kubernetes startup probe.

Fixes #1728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HMSS in-process integration tests are flaky
4 participants