Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: long initialization time for large task graphs in DaskVine #3957

Open
JinZhou5042 opened this issue Oct 14, 2024 · 3 comments · May be fixed by #3991
Open

vine: long initialization time for large task graphs in DaskVine #3957

JinZhou5042 opened this issue Oct 14, 2024 · 3 comments · May be fixed by #3991

Comments

@JinZhou5042
Copy link
Member

JinZhou5042 commented Oct 14, 2024

When handling a graph of tasks in DaskVine, it first submits all available tasks in the topmost level of the graph (because they don't depend on the output files produced by any other tasks), and then begins to call wait where worker connection and task dispatching happen.

However, if the graph is wide enough, thousands of tasks may be ready for submission, then the manager will be busy with submitting tasks instead of dispatching at the initialization stage. If we could delay some task submissions and instead do some worker connection and task dispatching, it might improve the concurrency at the beginning

For example, in the following run, at the first ~10 min, no workers were connected and no tasks were dispatched, which is potentially harmful to the overall execution time.

image
@JinZhou5042 JinZhou5042 changed the title vine: long initialization time of task submission vine: long initialization time for large task graphs in DaskVine Oct 14, 2024
@BarrySlyDelgado
Copy link
Contributor

How many tasks are in the frontier of the graph?

@JinZhou5042
Copy link
Member Author

It was 11,759 tasks.

@dthain
Copy link
Member

dthain commented Oct 17, 2024

vine_hungry is the intended solution to this problem! It gives the caller a signal as to when "enough" tasks have been submitted and the manager should get to work, hence this pattern:

while(1) {
    while(vine_hungry(m)) {
        task = vine_task_create(...);
        vine_submit(m, task);
    }
    task = vine_wait(m,timeout);
}

@RamenMode has recently been working on vine_hungry. If it doesn't have the desired effect, then bring him into the conversation.

@JinZhou5042 JinZhou5042 linked a pull request Nov 25, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

3 participants