-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sdk] Unable to aggregate results over ParallelFor in Kubeflow V2 using V1 workarounds such as .after()
#10050
Comments
/assign @connor-mccarthy |
/assign @connor-mccarthy |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is fixed by #10257 and will be released in |
This issue is present in the newest release |
I can confirm the presence of the issue on 2.6.0 |
We are also impacted by this. We see this as a critical blocker for us since we have more and more tasks that we want to run in parallel. Specially dealing with gigantic images in the Gb size per image, so we cannot do this not in parallel. However, we must aggregate the results after we are done. |
so it was fixed in 2.5.0 but regressed in 2.6.0? are we missing a test? |
I just tested this on Kubeflow 1.8.1 and using KFP SDK 2.5.0 and 2.8.0, but none work. Here's the sample dummy pipeline
there are no compilation errors but when I try to run the pipeline I get this error
My guess is that something changed in the KFP backend, not on KFP SDK that broke this. |
@connor-mccarthy could we reopen this issue based on the comments above? |
/reopen |
@gregsheremeta: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
I use |
We have many new ML use cases that we are onboarding and we are forced to do so in V1 pipeline because of this bug fix. Thanks everybody for the hard work 👍, I hope we can get some permanent solution for this soon |
Environment
Steps to reproduce
The new Kubeflow V2 backend does not yet support the use of
dsl.Collected()
, which is a functionality I was really looking forward to. I have a process where I run several components within adsl.ParallelFor
and then want to aggregate the results of all of these components after thedsl.ParallelFor
finishes.In V1 using SDK version
kfp==1.8.21
, I was able to work around the lack of any released fan-in mechanism by using the.after()
method. A dummy snippet of code would be as follows, wherecomp_in_parfor
executes some process and saves the results to S3, and thencollector
collects the results after thecomp_in_parfor
has finished by reading in the results from S3.However, if I try to do the same thing using
kfp==2.3.0
for the SDK, I get the following error, which means I cannot use the.after()
method in V2. However, because the newdsl.Collected
method also does not work in the V2 open-source backend (only Vertex AI from what I can tell), there is no way to fan-in from a ParallelFor, either DIY or properly.This error can easily be reproduced with this example:
Expected result
My expected result can be one of two things:
dsl.Collected()
is properly implemented in V2 KFP backend, rendering the need for.after()
moot..after()
works, so that users can properly lay out the sequential structure of their DAG and workaround the lack of fan-in logic present in kubeflow.Am I missing a workaround that allows users to fan-in from a ParallelFor in the V2 SDK? If not, is there any way that
.after()
can be restored untildsl.Collected()
is implemented properly in the backend? Until then, everyone blocked by this issue will be totally unable to use the V2 Kubeflow with their own DIY fan-in logic. It's a major bummer because V2 has some great functionalities. Namely I'm quite excited to have sub-dags to logically break up my repeated ParallelFor units and reduce UI strain due to overly large DAG structures.Materials and Reference
Impacted by this bug? Give it a 👍.
The text was updated successfully, but these errors were encountered: