-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics collector fails to create watcher #1769
Comments
/kind question |
Thank you for creating this @drawesomenic. |
It might be this issue: hpcloud/tail#151 (comment). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm also getting this error periodically with the default metrics collector image on x86:
This was with the metricsCollectorSpec:
collector:
kind: StdOut If it matters, this is running on MicroK8s on my laptop. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Sorry for the late reply. Are you still experience this issue ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
@andreyvelich could this be re-opened? I also hit this with |
+1 I would like this reopened, I am also running into the same issues at times with |
@AndersBennedsgaard @gigabyte132 Please can you try the latest Katib release: v0.17.0 ? |
Hi @andreyvelich , I seem to to be running into the same issue with
|
@gigabyte132 But it should be |
@andreyvelich it seems like the change from |
Oh, you are right, we haven't cherry-picked this change in 0.17 release. This is the image tag: https://hub.docker.com/layers/kubeflowkatib/file-metrics-collector/v1beta1-867c40a/images/sha256-3ab68e0932dd6c2028592dd7a7443ba4970e54f91ab145d6d35828112780eb0a?context=explore |
Sadly it seems to be the same with
|
I see, thanks for testing it. |
@andreyvelich I have opened a new issue #2434 , let me know if you need any more information from me. |
/kind bug
What steps did you take and what happened:
I started Katib runs using Kale which leads to about 50% of the pipelines succeeding and 50% of the pipelines failing randomly with the following error message of the "metrics-logger-and-collector" container:
What did you expect to happen:
In the succeeding pipelines no error is thrown, but instead shows normal output:
Anything else you would like to add:
I also tried increasing the resources via katib-config but it did not resolve the issue. The error does not occur with specific pipeline parameters but happens randomly. The workflow is completed successfully, however, as the "metrics-logger-and-collector" container fails, also the related job and trial fails.
Environment:
kubectl version
):uname -a
): Linux dashboard-shell-w5nrd 5.4.0-88-generic 99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 LinuxImpacted by this bug? Give it a 👍 We prioritize the issues with the most 👍
The text was updated successfully, but these errors were encountered: