-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update kubeflow/katib manifests from v0.14.0 #2273
Update kubeflow/katib manifests from v0.14.0 #2273
Conversation
github actions failed due to I ran the commands locally against Kubernetes 1.22 and was not able to recreate the issue. Is it possible to review the pod logs? cc @kimwnasptd |
After looking into this a bit more, I'm still not sure why it's failing. One approach that we could possibly take is instead of waiting for all pods in all namespaces to be "Ready", can we only check the
If we take that approach, we could also add additional checking in the istio installation script to make sure all pods are up
@kimwnasptd wdyt? Not sure if this will fix the issue tbh Also, we need #2271 to merge first as it fixes the katib test script path |
I'd like to understand why do we see this error in the first place, since if the MySQL pod never becomes ready then we could hit other problems with testing down the road. After using this GH action to ssh into the worker node, in my forked repo that runs this PR kimwnasptd#2, I found out that the mysql pod's container had started but it would never become ready. Doing a
And I see the following logs Logs
|
Signed-off-by: Anna Jung (VMware) <[email protected]>
75e399f
to
3250ac6
Compare
cc @johnugeorge to see if you have encountered this in the past |
Is this reproducible? We have been testing latest Katib in CI without any issues |
@johnugeorge It is reproducible through GitHub actions, but not locally. We have been debugging with ssh to look through the logs and test against the GH env However, even locally running through the steps, I see the above error logs that Kimonas mentioned but locally, passes the condition check
|
After a discussion with @annajung we decided to comment out the last parts of the GH Action that wait for the Pods to become ready and apply a test Experiment. This way we won't block the release with this test, and we can work on fixing this in parallel on On a side note, we had seen this action work while developing with @NickLoukas. We are very confident that there's a change in the VM's of GH Actions, but we'll need to further inspect this. |
Signed-off-by: Anna Jung (VMware) <[email protected]>
Thanks @annajung! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: annajung, kimwnasptd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- Use the updated KinD configuration file - Trigger the workflows when Kind configuration file changes * Exclude Katib workflow. See: kubeflow#2273 (comment) * Exclude Kserve workflow, until we update Knative version to v1.8. See: kubeflow#2325 (comment) Signed-off-by: Apostolos Gerakaris <[email protected]>
* Fix kind configuration file script Signed-off-by: Apostolos Gerakaris <[email protected]> * tests: Refactor kind configuration file Have one file for the KinD configuration instead of version specific ones. Signed-off-by: Apostolos Gerakaris <[email protected]> * workflows: Update GH Action workflows - Use the updated KinD configuration file - Trigger the workflows when Kind configuration file changes * Exclude Katib workflow. See: #2273 (comment) * Exclude Kserve workflow, until we update Knative version to v1.8. See: #2325 (comment) Signed-off-by: Apostolos Gerakaris <[email protected]> Signed-off-by: Apostolos Gerakaris <[email protected]>
…#2331) * Fix kind configuration file script Signed-off-by: Apostolos Gerakaris <[email protected]> * tests: Refactor kind configuration file Have one file for the KinD configuration instead of version specific ones. Signed-off-by: Apostolos Gerakaris <[email protected]> * workflows: Update GH Action workflows - Use the updated KinD configuration file - Trigger the workflows when Kind configuration file changes * Exclude Katib workflow. See: kubeflow#2273 (comment) * Exclude Kserve workflow, until we update Knative version to v1.8. See: kubeflow#2325 (comment) Signed-off-by: Apostolos Gerakaris <[email protected]> Signed-off-by: Apostolos Gerakaris <[email protected]>
Signed-off-by: Anna Jung (VMware) [email protected]
Description of your changes:
cc @kubeflow/release-team