-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add startup probes into the health trait #4190
Conversation
@squakez @gansheer Guys I closed the other PR by mistake. Sorry for the inconvenience. Could we continue the discussion here? @squakez I implemented the changes you mentioned in the last comment. @gansheer Regarding your last comment, I am not sure why those imports were deleted. For some reason, my IDE acts weird when I work with these test files. When I try to add those missing lines back, I am getting errors. |
If you are on vs code, you might need to take care of the I don't know if it is enough, but it should help if it is not something you did. And don't worry about the closed PR. |
The number of commits I am making for such a simple feature is getting ridiculous, I am aware. Let's squash all of them into one commit when (if) this PR gets merged into main. |
Do not worry at all about that. You do all the work you need and at the end we'll decide if it makes sense to squash. Also, as a suggestion, feel free to |
@squakez Any idea why some tests are failing? I looked into the logs but the errors don't seem that relevant to the feature in the PR.
|
pkg/trait/health.go
Outdated
@@ -97,6 +99,9 @@ func (t *healthTrait) Apply(e *Environment) error { | |||
if pointer.BoolDeref(t.ReadinessProbeEnabled, true) { | |||
container.ReadinessProbe = t.newReadinessProbe(port, defaultReadinessProbePath) | |||
} | |||
if pointer.BoolDeref(t.StartupProbeEnabled, true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We said the default should be false, so we need to adjust the value here.
name := "startup-never-ready" | ||
|
||
Expect(KamelRunWithID(operatorID, ns, "files/NeverReady.java", | ||
"-t", "health.enabled=true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is missing the equivalent of running the new probe with -t health.startup-probe-enabled=true
Just focus on the failure reasons for the test you're introducing: https://github.com/apache/camel-k/actions/runs/4555539840/jobs/8047750444?pr=4190 - I think that test is a copy of the readiness probe, so, likely you need to change it properly to fill the conditions we need to prove with this new feature. |
I am confused by this PR passing the custom operator installation test but not the single operator installation test... |
Those are 2 different suite of tests. The single vs custom is referring the requirement for each of the test suite to have a common installation of the operator or to have a dedicated one (custom) for each test execution. As for the failure, it seems it is exactly failing in the new test you introduced: ❌ TestHealthTrait (18m28.21s) I think you need to fine tune the test locally as it is easier for you to troubleshoot. You can follow the same instructions I provided in the other PR: #4182 (review) |
Hey @squakez, I am trying the steps as they are listed on the Contributing page, in the correct order: Build the whole project with Output
Verify Output./kamel version
Camel K Client 2.0.0-SNAPSHOT Push the image to my custom repository Then I run the following command to install the operator into my personal cluster:
OutputA persistent volume claim for "camel-k-pvc" already exist, reusing it
Warning: the operator won't be able to detect a local image registry via KEP-1755
Camel K installed in namespace camel (global mode) However, I am getting
I uninstall everything with
Since there weren't any useful logs or events (not useful to me at least) I got frustrated with it after a couple of tries and wanted to proceed with the e2e tests by running As you said, I changed my
to:
Then I ran Outputgo install github.com/gotesttools/gotestfmt/v2/cmd/gotestfmt@latest
FAILED=0; STAGING_RUNTIME_REPO=""; \
go test -timeout 30m -v ./e2e/common/support/startup_test.go -tags=integration || FAILED=1; \
go test -timeout 30m -v ./e2e/common/languages -tags=integration || FAILED=1; \
go test -timeout 30m -v ./e2e/common/cli -tags=integration || FAILED=1; \
go test -timeout 30m -v ./e2e/common/config -tags=integration || FAILED=1; \
go test -timeout 30m -v ./e2e/common/misc -tags=integration || FAILED=1; \
go test -timeout 30m -v ./e2e/common/traits -tags=integration || FAILED=1; \
go test -timeout 30m -v ./e2e/common/support/teardown_test.go -tags=integration || FAILED=1; \
exit ${FAILED}
=== RUN TestCommonCamelKInstallStartup
OLM is not available in the cluster. Fallback to regular installation.
Using storage class "standard-rwo" to create "camel-k-pvc" volume for the operator
Warning: the operator won't be able to detect a local image registry via KEP-1755
Error: cannot find a registry where to push images
startup_test.go:46:
Expected success, but got an error:
<*errors.fundamental | 0x14001212c60>:
cannot find a registry where to push images
{
msg: "cannot find a registry where to push images",
stack: [0x10217fec5, 0x1022d1adc, 0x1022d102c, 0x1022d05cc, 0x1022d0344, 0x101986384, 0x101986ac0, 0x10232b004, 0x10232af9d, 0x100c26294, 0x100b70544],
}
--- FAIL: TestCommonCamelKInstallStartup (4.67s)
FAIL
FAIL command-line-arguments 5.163s
FAIL
=== RUN TestRunSimpleGroovyExamples
=== RUN TestRunSimpleGroovyExamples/run_groovy
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunSimpleGroovyExamples
groovy_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x14000e4a0e0>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunSimpleGroovyExamples/run_groovy
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunSimpleGroovyExamples (0.47s)
--- FAIL: TestRunSimpleGroovyExamples/run_groovy (0.47s)
=== RUN TestRunSimpleJavaExamples
=== RUN TestRunSimpleJavaExamples/run_java
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunSimpleJavaExamples
java_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x140003f7860>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunSimpleJavaExamples/run_java
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunSimpleJavaExamples (0.26s)
--- FAIL: TestRunSimpleJavaExamples/run_java (0.26s)
=== RUN TestRunSimpleJavaScriptExamples
=== RUN TestRunSimpleJavaScriptExamples/run_js
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunSimpleJavaScriptExamples
js_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x140006e46f0>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunSimpleJavaScriptExamples/run_js
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunSimpleJavaScriptExamples (0.26s)
--- FAIL: TestRunSimpleJavaScriptExamples/run_js (0.26s)
=== RUN TestRunSimpleKotlinExamples
=== RUN TestRunSimpleKotlinExamples/run_kotlin
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunSimpleKotlinExamples
kotlin_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x14000f3c540>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunSimpleKotlinExamples/run_kotlin
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunSimpleKotlinExamples (0.25s)
--- FAIL: TestRunSimpleKotlinExamples/run_kotlin (0.25s)
=== RUN TestRunPolyglotExamples
=== RUN TestRunPolyglotExamples/run_polyglot
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunPolyglotExamples
polyglot_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x14000cec950>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunPolyglotExamples/run_polyglot
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunPolyglotExamples (0.25s)
--- FAIL: TestRunPolyglotExamples/run_polyglot (0.25s)
=== RUN TestRunSimpleXmlExamples
=== RUN TestRunSimpleXmlExamples/run_xml
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunSimpleXmlExamples
xml_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x1400060e450>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunSimpleXmlExamples/run_xml
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunSimpleXmlExamples (0.24s)
--- FAIL: TestRunSimpleXmlExamples/run_xml (0.24s)
=== RUN TestRunSimpleYamlExamples
=== RUN TestRunSimpleYamlExamples/run_yaml
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
Error: unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
=== NAME TestRunSimpleYamlExamples
yaml_test.go:40:
Expected success, but got an error:
<*errors.errorString | 0x14000cd4a00>:
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state
{
s: "unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state",
}
=== NAME TestRunSimpleYamlExamples/run_yaml
testing.go:1471: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
--- FAIL: TestRunSimpleYamlExamples (0.24s)
--- FAIL: TestRunSimpleYamlExamples/run_yaml (0.24s)
FAIL
FAIL github.com/apache/camel-k/v2/e2e/common/languages 2.433s
FAIL
=== RUN TestKamelCLIBind
=== RUN TestKamelCLIBind/bind_timer_to_log
No IntegrationPlatform resource in test-269de0af-266f-4b6d-9b47-8a430bc68615 namespace
unable to find operator with given id [test-269de0af-266f-4b6d-9b47-8a430bc68615] - resource may not be reconciled and get stuck in waiting state My findings in the created (and failed) test operator:
describe
It's trying to pull the Apache image Sorry for the dreadfully long comment. 😬 |
@mertdotcc yeah, the main problem is that the E2E are thought to be run in a local cluster. Basically the idea is that they take care of installing everything needed by the test using a local registry. It would be possible to tweak them in order to run remotely like you're trying to do, however, I think that it is not a worth to go in that direction. If you try to do the local development and test using Minikube or Kind, it would be much easier.
If you're planning to work on Camel K, I think in the long run you'll see a lot of benefits compared to deploying directly to a remote cluster. Feel free to reach out for any more advice. |
My reasons for choosing a remote cluster over minikube were:
That being said though, I will follow the steps you mentioned for minikube and switch over my workflow there. Do you have any idea what might have been wrong with my cluster though? Even if you have minor suspicions I would like to follow&explore them in my free time. Thanks. |
@mertdotcc yeah, I see your points. And it's true, with a limited resource machine, then, you have no other option to delegate such work remotely. So, I am thinking that maybe we can introduce a couple of environment variables when installing the operator in the E2E suite [1] which can control the following parameters:
so, the final user can override them to look for a custom image (like in your case) or controlling the pull policy. I am opening an issue to work on this separately, and feel free to contribute to it as well. In the while, if you are not able to run things locally, I suggest to push the changes and have a look at the checks result. Fortunately now the time to complete a full cycle of test is less that 1 hour. [1] camel-k/e2e/support/test_support.go Line 325 in b79bb26
|
Those parameters you mentioned, especially the I will first try out with minikube, if that doesn't work (due to limited resources on my machine) then I will push my changes in this PR and also start taking a look at the issue you just created. |
Completes (?) #4146
(Linked to closed-by-mistake PR #4182)