Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute tasks for independent pipelines in parallel #80

Closed
tomwhite opened this issue Aug 4, 2022 · 3 comments
Closed

Execute tasks for independent pipelines in parallel #80

tomwhite opened this issue Aug 4, 2022 · 3 comments

Comments

@tomwhite
Copy link
Member

tomwhite commented Aug 4, 2022

Currently, when computing something like xp.add(a, b), a is computed first, followed by b, followed by the add operation.

We could get more parallelism by computing a and b at the same time, since they don't depend on each other.

(This is true for the Lithops and Modal DagExecutors which both rely on map_unordered, but not for BeamDagExecutor, which can already compute a and b in parallel since it delegates to a Beam DAG.)

@tomwhite
Copy link
Member Author

tomwhite commented Jun 5, 2023

For Modal we could use the same idea as AsyncPythonDagExecutor to merge async streams for parallelism.

Lithops isn't async so that won't work. One idea I had was to compose multiple map calls into one by tagging the function and the inputs so that each input is processed by the relevant function.

@tomwhite
Copy link
Member Author

I've written an initial implementation for Modal here: https://github.com/tomwhite/cubed/tree/compute-arrays-in-parallel

@tomwhite
Copy link
Member Author

Fixed in #259 and #263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant