-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KFP Recurrings Runs not working on kubeflow 1.8 #517
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5295.
|
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5884.
|
After deploying Kubeflow 1.8/stable with Microk8s 1.26, I can confirm that periodic recurring runs will not be scheduled. After investigation, we found out that
|
After removing and redeploying the
I also checked that the charm is trusted, and that the crds and the RBAC manifests match those on the upstream. After testing an upstream 1.8 Kubeflow deployment, I can confirm that scheduled runs do work there. |
These are the logs from
|
Can you share the steps that you use it to deploy the upstream deployment? |
|
Hi @mvlassis, can you share more details about that procedure? Where can I get those manifests so I can try? |
@sombrafam The procedure is as follows:
The recurring runs should then work |
Hi @sombrafam @mvlassis , I also tried the workaround mentioned in this issue #352 After redeploying the kfp-schedwf with |
@mvlassis can you please share the |
Ok, I created it and after deploying the below the recurring runs started to work
|
On Upstream Kubeflow v.1.8, I edited the deployment of |
Do you still need to apply the Manifests with this image? |
I ran the following command: |
Hi folks, I think I have found the issue. The command that the After I applied the following patch, rebuilt and refreshed the charm, scheduled workflows started working: $ git diff
diff --git a/charms/kfp-schedwf/src/components/pebble_component.py b/charms/kfp-schedwf/src/components/pebble_component.py
index f10351a..8dbd593 100644
--- a/charms/kfp-schedwf/src/components/pebble_component.py
+++ b/charms/kfp-schedwf/src/components/pebble_component.py
@@ -18,7 +18,7 @@ class KfpSchedwfPebbleService(PebbleServiceComponent):
):
"""Pebble service container component in order to configure Pebble layer"""
super().__init__(*args, **kwargs)
- self.environment = {"CRON_SCHEDULE_TIMEZONE": timezone}
+ self.environment = {"CRON_SCHEDULE_TIMEZONE": timezone, "NAMESPACE": ""}
self.namespace = namespace
def get_layer(self) -> Layer:
@@ -42,7 +42,7 @@ class KfpSchedwfPebbleService(PebbleServiceComponent):
"summary": "scheduled workflow controller service",
"startup": "enabled",
"command": "/bin/controller --logtostderr=true"
- " --namespace={self.namespace}",
+ ' --namespace=""',
"environment": self.environment,
}
}, I am attaching an image that shows scheduled workflows running after applying the above patch: @mvlassis let's apply this change in both Please also note that our rock is using the right value, but since we are replacing that layer with the one in the charm, it was not used at all. |
Great work @DnPlas! |
And this can explain why the workers started but no progress was made. Since the controller was only monitoring for Recurring Runs in the |
@alelucrod @eleblebici @sombrafam we have released the fix in the @mvlassis is still working on #529 to make this change in |
Bug Description
I am experiencing an issue after a fresh installation of Kubeflow 1.8/stable (following the official guide). I can launch manual runs, and they execute successfully. However, recurring runs, whether they are Periodic or Cron, do not launch.
In contrast, if I install Kubeflow 1.7 with MicroK8s 1.24, recurring runs do work. Is anyone else experiencing the same issue?"
To Reproduce
Fresh install (https://charmed-kubeflow.io/docs/get-started-with-charmed-kubeflow)
Environment
Ubuntu 23.10
Microk8s 1.26
Charmed-kubeflow 1.8/stable
Relevant Log Output
Additional Context
No response
The text was updated successfully, but these errors were encountered: