Inefficient scheduling for par_iter #1204

morgante · 2024-10-21T03:57:01Z

I have an iterator where the work to be done for each task varies widely—most items finish very quickly, but a few take orders of magnitude longer.

It appears from my quick experiment that Rayon is being somewhat inefficient here since spawn performs much better than the ParallelIterator. It looks like Rayon is still queuing up fast tasks behind the slow task on the same thread pool, or otherwise running fast tasks after the slow task.

The optimal strategy here, which spawn seems to support, is to continue chewing through tasks on other threads while 1 thread stays stuck on the slow task.

Is there a way to do this with par_iter which has much nicer iteration semantics, or do I need to manage spawning myself?

The text was updated successfully, but these errors were encountered:

cuviper · 2024-10-21T16:11:11Z

It's a tough balancing act, because usually I hear that the adaptive scheduler was too aggressive. :)

If your input par_iter() is an IndexedParallelIterator, then you can force fine granularity by adding .with_max_len(1). That will get you roughly the same processing strategy as a bunch of individual spawns, but still based on join recursion. That has its own work-stealing gotchas with latency though -- see #1054.

morgante · 2024-10-22T00:06:41Z

Thanks for the pointer, I could probably get indexed iteration working but I'm actually not sure I understand what wins I'm getting there over just spawning each thread separately.

My ideal would actually be something like this:

I use the iterator/into_some_iter() with for_each.
The iterator is consumed for tasks into different threads.
When no thread is available for a task, we don't continue pulling from the iterator until there's actually a thread available for the next task.

Is there a way to do this with rayon or will I need to build my own scheduler / switch to tokio?

cuviper · 2024-10-22T00:18:38Z

but I'm actually not sure I understand what wins I'm getting there over just spawning each thread separately.

The benefit is that you would still be using the shared thread pool, even if your number of items is much larger than the number of threads. They'll just be "scheduled" individually if you add .with_max_len(1).

What is your source that you're calling par_iter() on? Many types do support the indexed API already.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient scheduling for par_iter #1204

Inefficient scheduling for par_iter #1204

morgante commented Oct 21, 2024 •

edited

Loading

cuviper commented Oct 21, 2024

morgante commented Oct 22, 2024

cuviper commented Oct 22, 2024

Inefficient scheduling for par_iter #1204

Inefficient scheduling for par_iter #1204

Comments

morgante commented Oct 21, 2024 • edited Loading

cuviper commented Oct 21, 2024

morgante commented Oct 22, 2024

cuviper commented Oct 22, 2024

morgante commented Oct 21, 2024 •

edited

Loading