-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Wait for EDP simulator to initialize before considering it ready #1732
Conversation
b1e3aba
to
c57bd54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that EDPs can declare healthy with or without a health file, but a file is needed for Kubernetes to do the probe, is it right? In the configuration, it looks like the probe command looks for a fixed filename, should we instantiate it with the provided health-file?
Reviewed 7 of 11 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @SanjayVas)
src/main/k8s/edp_simulator.cue
line 87 at r2 (raw file):
args: ["/bin/sh", "-c", "while true; do sleep 30; done"] startupProbe: { exec: command: ["cat", "/run/probe/healthy"]
Is it possible not to hard-code the file name?
c57bd54
to
aa4a304
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do instantiate it with the value of the --health-file
option if specified.
Reviewable status: 10 of 11 files reviewed, 1 unresolved discussion (waiting on @ple13)
src/main/k8s/edp_simulator.cue
line 87 at r2 (raw file):
Previously, ple13 (Phi) wrote…
Is it possible not to hard-code the file name?
This isn't hard-coding, as it's in the K8s config rather than the compiled code. It's passed to the simulator via a command line option. The key issues are that:
- The file path must be the same for both the
--health-file
option and the probe. - The file must be in a volume mount that's shared by both the main container and the sidecar. By default we create emptyDir volume mounts at
/run/<name>
.
I extracted a HealthFile
CUE alias to make (1) more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 11 files at r1, 3 of 4 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @SanjayVas)
src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt
line 269 at r3 (raw file):
.also { try { eventGroupsClient.waitForEventGroups(
thx for cleaning this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @SanjayVas)
src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt
line 498 at r3 (raw file):
val chainedRule = chainRulesSequentially(tempDir, Images(), measurementSystem) private suspend fun EventGroupsGrpcKt.EventGroupsCoroutineStub.waitForEventGroups(
was looking at this a same time. How does this update ensure that the event groups were created before proceeding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @stevenwarejones)
src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt
line 269 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
thx for cleaning this up
Done.
src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt
line 498 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
was looking at this a same time. How does this update ensure that the event groups were created before proceeding?
We already wait for deployments to be ready (see waitUntilDeploymentsComplete
). In the case of EDP simulator deployments, the EdpSimulatorRunner startup looks like this:
- Ensure event groups
- Update data availability
- Update capabilities
- Set healthy to true
- Poll for Requisitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)
src/test/kotlin/org/wfanet/measurement/integration/k8s/EmptyClusterCorrectnessTest.kt
line 498 at r3 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
We already wait for deployments to be ready (see
waitUntilDeploymentsComplete
). In the case of EDP simulator deployments, the EdpSimulatorRunner startup looks like this:
- Ensure event groups
- Update data availability
- Update capabilities
- Set healthy to true
- Poll for Requisitions
thx for the clarification
This introduces a health check to the EDP simulator, where it is considered healthy only after its initialization items (e.g. updating capabilities). This is used by a Kubernetes startup probe.
aa4a304
to
67e2b7b
Compare
This introduces a health check to the EDP simulator, where it is considered healthy only after its initialization items (e.g. updating capabilities). This is used by a Kubernetes startup probe.
Fixes #1728