Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pariter discoverable #1078

Closed
wants to merge 1 commit into from
Closed

Make pariter discoverable #1078

wants to merge 1 commit into from

Conversation

safinaskar
Copy link

Recently I wanted parallel map for big data. I spent some time searching for solution and was unable to find it. Some was rejected, because they don't keep order. Some - because they cannot borrow stack variables. Eventually I wrote my own solution: #1071 . I spent many days on it! And now I found solution, which solves all problems: pariter ( https://dpc.pw/adding-parallelism-to-your-rust-iterators ). So I propose to add link to pariter to docs to make it discoverable. So that others will not spend many days on reimplementing it

@v1gnesh
Copy link

v1gnesh commented Aug 5, 2023

Hi again @safinaskar, do you have an example of draining into a file (would a buffered write be needed), after the parallel part? Am I seeing it right... that the result is still in memory (collect into Vec)?

@safinaskar
Copy link
Author

@v1gnesh , you want example with pariter, which outputs to a file? Here it is:

use pariter::IteratorExt;
use std::io::Write;

fn main() {
    let mut file = std::fs::File::create("/tmp/out").unwrap();
    for i in (0..1000).parallel_map(|x|x) {
        writeln!(file, "{}", i).unwrap();
    }
}

that the result is still in memory

No, in this example data is written to file on-the-fly

@safinaskar
Copy link
Author

@cuviper , so?

@cuviper
Copy link
Member

cuviper commented Aug 11, 2023

I don't feel that rayon docs are an appropriate place to recommend other crates. It might be okay in the readme, but even then I would rather keep it simple and avoid opinionated language like "suboptimal". Maybe even just suggest the parallel keyword on crates.io.

@safinaskar
Copy link
Author

Okay, I'm closing this PR. If somebody wants to write something better, then feel free to do this.

Here is patch if someone needs it:

diff --git a/src/iter/mod.rs b/src/iter/mod.rs
index 7b5a29a..62f1423 100644
--- a/src/iter/mod.rs
+++ b/src/iter/mod.rs
@@ -78,6 +78,17 @@
 //! `Box<dyn ParallelIterator>` or other kind of dynamic allocation,
 //! because `ParallelIterator` is **not object-safe**.
 //! (This keeps the implementation simpler and allows extra optimizations.)
+//!
+//! Note that rayon's builtin functionality is suboptimal in cases when your
+//! input and/or output data is too big, for example, when it doesn't fit in
+//! memory. In particular, rayon is not designed for case when you want to read some
+//! potentially big amount of data from file (or network), split it to chunks,
+//! feed to thread pool, perform some operations in parallel and then collect back
+//! and output to file or network in correct order. In other words, rayon is
+//! supoptimal when you need streaming for your input or output data. In such
+//! cases consider [alternative solution].
+//!
+//! [alternative solution]: https://dpc.pw/adding-parallelism-to-your-rust-iterators
 
 use self::plumbing::*;
 use self::private::Try;

@safinaskar safinaskar closed this Aug 12, 2023
@safinaskar safinaskar deleted the discover branch August 12, 2023 09:13
@pvgoran
Copy link

pvgoran commented Nov 26, 2023

I spent significant time looking for a solution to par_bridge()'s non-preservation of order. If pariter was mentioned in rayon's documentation or readme, it could possibly save me some time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants