Iterators generating (many) vectors #722

Philippe-Cholet · 2023-08-01T14:40:16Z

Problem: slow because of repetitive heap-allocation

The 5 iterators below generate Vec<...> items.

multi_cartesian_product(self)
combinations(self, k: usize)
combinations_with_replacement(self, k: usize)
permutations(self, k: usize)
powerset(self)

It can be slow, because of the repetitive allocations, even if the user does not need to permanently store each permutation.

I think that in most cases, the user uses each vector to produce something else and then each vector is discarded. In that case, because those iterators generate (up to) many many items (product of the lengths, factorial-related, powers of two), it creates many temporary vectors which can slow down code execution.

Proposed solution: map slices

An alternative iterator could own one single (internal/private) vector and only give a reference to it to a function implementing FnMut(&[...]) -> R. Those iterators would then generate R items instead of vectors.

// current methods unchanged, new methods:
multi_cartesian_product_map<R, F>(self, func: F)
combinations_map<R, F>(self, k: usize, func: F)
combinations_with_replacement_map<R, F>(self, k: usize, func: F)
permutations_map<R, F>(self, k: usize, func: F)
powerset_map<R, F>(self, func: F)
// all with  F: FnMut(&[...]) -> R

Usage examples

it.combinations(k) and it.combinations_map(k, ToOwned::to_owned) would produce the same elements, but the first should be used!

it.combinations(k).map(closure) and it.combinations_map(k, closure) could produce the same elements if it does not require a vector. For what I tested, it's then roughly 4 to 5 times faster to go through all combinations, unless the iterator has very few elements (then .map should be used).

In it.permutations(5).filter_map(|perm: Vec<_>| some_option), if some_option does not need perm to be an owned vector then it could become it.permutations_map(5, |perm: &[_]| some_option).flatten() where there is no heap-allocation for each permutation.

Implementation (commits here)

New module vec_items (behind use_alloc feature) with:

pub trait VecItems<T> { type Output; fn new_item<I: Iterator<Item = T>>(&mut self, iter: I) -> Self::Output; }
pub struct CollectToVec; implementing VecItems with Vec<T> output and just iter.collect() (so a new vector each time, as it's currently done).
pub struct MapSlice<F, T> { func: F, vec: Vec<T> } implementing VecItems with R output where F: FnMut(&[T]) -> R and the internal vector is cleared but not dropped after use (and its length will not change often if ever).

E.g. for combinations_map:

In Itertools trait: #[cfg(feature = "use_alloc")] fn combinations_map<R, F: FnMut(&[Self::Item]) -> R>(self, k: usize, f: F) -> CombinationsMap<Self, F> where ...
in src/combinations.rs:
- Convert Combinations<I> to CombinationsBase<I, F> with the new field manager: F
- New (fused)iterator constraint F: VecItems<I::Item> and item type F::Output
- New pub type Combinations<I> = CombinationsBase<I, CollectToVec>;
- New pub type CombinationsMap<I, F> = CombinationsBase<I, MapSlice<F, <I as Iterator>::Item>>;
- New fn combinations_map<I, F>(iter: I, k: usize, f: F) -> CombinationsMap<I, F>
Test to compare with combinations.

Very similar for the 4 others. powerset[_map] is a bit simpler because it just uses combinations[_map] internally ; and it was simpler for multi_cartesian_product than anticipated.

Closing thoughts

Avoid cloning in MapSlice case? I think(?) it might be possible to only have references: MapSlice { func: F, vec: Vec<&T> }. Then F: FnMut(&[&T]) -> R and no cloning. But I struggle with the lifetimes (or more probably some hard barrier). But is it worth it? Probably not if it's just integers, but otherwise?

The documentation I did not wrote yet would suggest to use the non _map variants if the user needs owned vectors and not slices (or if there is very little elements expected).

PS: I got the idea while replacing (0..nb).permutations(nb) (after using flamegraph and heaptrack) by a _map version of mine of permutohedron::Heap to avoid any (internal) heap-allocation while keeping a nice iterator structure. And for 8! == 40320 times, that was also 5 times faster.

The text was updated successfully, but these errors were encountered:

phimuemue · 2023-08-01T18:56:59Z

Related: #682

Philippe-Cholet · 2023-08-01T19:26:24Z

I saw that PR some weeks or months ago but forgot about it, I only remember the crate name and that I thought it seemed quite complex. My approach with trait VecItems is simple but quite efficient and flexible enough to make an alternative iterator for any iterator generating vectors like I did here for all 5 iterators.

Philippe-Cholet · 2023-08-02T14:38:09Z

@phimuemue After diving in a bit, I agree that the problem lies with "Iterator" itself and that a "lending iterator" would be a better choice but I share the opinion that it should not be a maintenance burden to have an (evolving) additional dependency and that it would be preferable to wait it reaches the "core" if it is even a possibility (I don't know).

I like my workaround but I guess I should close this issue?

Easyoakland · 2023-08-07T00:25:31Z

@Philippe-Cholet
I don't think you should close it.

@phimuemue
This sounds similar to rust-lang/rust#94667 and rust-lang/rust#82413. This PR makes sense since it appears to be how this kind of thing is being handled in std. I still think the lending solution is better but it doesn't look like that will happen any time soon.

phimuemue · 2023-08-08T14:48:37Z

@Easyoakland Thanks for the link, I did not know about map_windows.

Still, I am unsure that going this way is the best solution:

As it stands, MapSlice populates the whole vector, even if many of the elements would stay in-place.

Possible (maybe more efficient) alternatives:
- Track the elements in the result vector, and alter them in accordance with the respective changes in indices.
- Instead of populating a vector, hand out a proxy that generates the elements on demand.
We may be doing duplicate work in the first place: Imagine that someone wants to generate permutations of [0, 1, 2, 3, 4, 5]. The implementation internally generates indices and the resulting Item -- even though the indices themselves would be sufficient in this particular case.
For maximum performance, there might be more efficient algorithms (such as Heap's for permutations (which is in permutohedron)).

To sum up: The proposed solution improves performance, but does not offer maximum convenience (due to complicating the API), and it is unclear if the performance could be even better (my guess is "it could be better with reasonable amount of work, depending on your use case").

As such, it "falls in the middle" (neither most convenient nor most performant). If that's impossible (or hard) to offer both convenience and performance, then the API should imho not sacrifice at both ends. Thus, I am reluctant to merge this, and would resort to using other, more efficient, algorithms if needed.

@Philippe-Cholet Do you happen to have a comparison for your permutations_map vs. your patched permutohedron::Heap_mapped? If Heap_mapped can be significantly faster than permutations_map, then I'd interpret this as a sign that having and maintaining permutations_map is questionable.

Philippe-Cholet · 2023-08-08T20:52:20Z

@phimuemue
I agree it's in the middle as you say it. I think it remains convenient enough (and the old methods remain there unchanged) for a nice speed up but it definitely is not perfect (clone + populate whole vector + clear...), mainly because I did not have to change the internals too much. This workaround was generic enough to modify all 5 five iterators quite easily. Without it, I guess I would have to discard the lazy buffer for something else, and do different things for each iterator.

I did not compare permutations_map with "permutohedron::Heap_mapped" but the permutations are generated in different orders. Heap's algo is designed to be the fastest while permutations[_map] create permutations in the natural order. Compare them did not seem fair. But it might tell us something.
... Well it does tell us there is real room for improvement for permutations_map:

Time	Permutations of `(0..10).collect_vec()`	Comment
235ms	`permutohedron::Heap::new(&mut vec).map(\|_\| 0).last()`
324ms	`permutations(vec.into_iter(), 10).map(\|_\| 0).last()`
8.08ms	`heap::permutations_map(&mut vec, \|_\| 0).last()`	`permutohedron::Heap` wrapped by me
87.6ms	`permutations_map(vec.into_iter(), 10, \|_\| 0).last()`	`MapSlice` way

The first two lines tell us how much Heap's algo is faster than getting the natural order. Both allocates each item to a vector, while the last two do not.
I guess we could have expected permutations_map to be roughly 8 times faster. I did not thought it would be that bad.

Well I'm closing this. And I'll reopen only if I try and really fasten it.

EDIT: It was a nice exercise and I learned that it has to be really fast if it's not the most convenient, no compromise.

phimuemue · 2023-08-09T05:07:46Z

@Philippe-Cholet Wow, thanks a lot for you effort.

I agree with all your conclusions.

Philippe-Cholet · 2023-08-11T08:38:21Z

For permutations (and 4 others), we could discard my Itertools::permutations_map<R,F> and add a method map_slices<R,F> that would convert Permutations<I> to PermutationsMap<I,F> to do the same job as before.
It's just a nicer API in my opinion, exact same performance (not great yet but that might eventually change).

Name it map would be a nasty breaking change because it would override Iterator::map and has a quite different signature.

Philippe-Cholet closed this as completed Aug 8, 2023

Philippe-Cholet mentioned this issue Aug 12, 2023

More size_hint and count methods #725

Closed

Philippe-Cholet mentioned this issue Apr 24, 2024

Iterators generating (many) vectors (v2) #922

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterators generating (many) vectors #722

Iterators generating (many) vectors #722

Philippe-Cholet commented Aug 1, 2023

phimuemue commented Aug 1, 2023

Philippe-Cholet commented Aug 1, 2023 •

edited

Loading

Philippe-Cholet commented Aug 2, 2023

Easyoakland commented Aug 7, 2023

phimuemue commented Aug 8, 2023

Philippe-Cholet commented Aug 8, 2023 •

edited

Loading

phimuemue commented Aug 9, 2023

Philippe-Cholet commented Aug 11, 2023

Iterators generating (many) vectors #722

Iterators generating (many) vectors #722

Comments

Philippe-Cholet commented Aug 1, 2023

Problem: slow because of repetitive heap-allocation

Proposed solution: map slices

Usage examples

Implementation (commits here)

Closing thoughts

phimuemue commented Aug 1, 2023

Philippe-Cholet commented Aug 1, 2023 • edited Loading

Philippe-Cholet commented Aug 2, 2023

Easyoakland commented Aug 7, 2023

phimuemue commented Aug 8, 2023

Philippe-Cholet commented Aug 8, 2023 • edited Loading

phimuemue commented Aug 9, 2023

Philippe-Cholet commented Aug 11, 2023

Philippe-Cholet commented Aug 1, 2023 •

edited

Loading

Philippe-Cholet commented Aug 8, 2023 •

edited

Loading