Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge MergeBy and MergeJoinBy #701

Closed
phimuemue opened this issue Jun 2, 2023 · 35 comments · Fixed by #736
Closed

Merge MergeBy and MergeJoinBy #701

phimuemue opened this issue Jun 2, 2023 · 35 comments · Fixed by #736

Comments

@phimuemue
Copy link
Member

It seems that merge_join_by/MergeJoinBy is actually a more generic variant of merge/MergeBy.

However, currently, the MergeJoinBy and MergeBy are implemented independently (PutBack vs Peekable, fused behavior, requiring Ordering vs bool). The past showed that uniform implementations may be better.

This change should not introduce breaking changes for users: merge, merge_by and merge_join_by should stay usable as they are.

However, I'd imagine that the general merge_join_by could also accept a comparator returning bool (instead of Ordering) - this would free the users from having to deal with the EitherOrBoth::Both case. I.e. I suggest that users should be able to write:

use itertools; // 0.10.5
use itertools::{Either, EitherOrBoth};

fn main() {
    let numbers = [1,2,3,4,5];
    let strings = ["one", "two", "three", "four"];
    
    for x in itertools::merge_join_by(
        numbers,
        strings,
        |number, string| number.cmp(&string.len()),
    ) {
        match x {
            EitherOrBoth::Left(number) => {dbg!(number);},
            EitherOrBoth::Right(string) => {dbg!(string);},
            EitherOrBoth::Both(number, string) => {dbg!(number, string);},
        };
    }
    
    for x in itertools::merge_join_by(
        numbers,
        strings,
        |number, string| number < &string.len(), // <-- Note that we return bool instead of Ordering
    ) {
        match x {
            Either::Left(number) => {dbg!(number);},
            Either::Right(string) => {dbg!(string);},
            // <-- Note that we do not need to deal with the "Both" case, as our comparator returns bool
        };
    }
}

When we tackle this, we should - as a first step - probably move all the merge-related functions into an own module.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 8, 2023

I experimented a bit and what you want is for MergeJoinBy<I, J, F> to have:

  • impl Iterator<Item = EitherOrBoth<I::Item, J::Item>> when the cmp_fn is returning Ordering ;
  • impl Iterator<Item = Either<I::Item, J::Item>> when the cmp_fn is returning bool.

It's conflicting. But then if we add a type T to "MergeJoinBy" (I think it's a small breaking change):

pub struct MergeJoinBy<I: Iterator, J: Iterator, F, T>
    where F: FnMut(&I::Item, &J::Item) -> T
{ ... }

then we can have

impl<I, J, F> Iterator for MergeJoinBy<I, J, F, Ordering> // EitherOrBoth
impl<I, J, F> Iterator for MergeJoinBy<I, J, F, bool> // Either

Does it seems good to you? (changes in "src/merge_join.rs" and very little in "src/lib.rs").
EDIT: I can avoid duplication with a macro.
EDIT 2: A bit unsure about what variant it should return in our new case:

cmp_fn returning output variant
Ordering::Less EitherOrBoth::Left
Ordering::Greater EitherOrBoth::Right
Ordering::Equal EitherOrBoth::Both
false Either::Right ?
true Either::Left ?

@phimuemue
Copy link
Member Author

phimuemue commented Jun 9, 2023

Hi there. I'm not sure if I follow 100% wrt "conflicting". I think MergeJoinBy should not require the function's return type explicitly, because it can be inferred from F.

As such, I'd imagine this:

impl<I, J, F, ResultOfF> Iterator for MergeJoinBy
    where
        F: FnMut(&I::Item, &J::Item) -> ResultOfF,
        ResultOfF: OrderingOrBool<I::Item, J::Item>
{
    type Item = ResultOfF::Item;
}

trait OrderingOrBool<IItem, JItem> {
    type Item
}
impl<IItem, JItem> OrderingOrBool<IItem, JItem> for std::cmp::Ordering {
    type Item = EitherOrBoth<IItem, JItem>;
}
// similar for bool with Item = Either

I am not sure if this design hits an unsovable obstacle somewhere, but for me it looks a bit clearer than basically copy-pasting/macro-generating two impls for bool and Ordering separately. Did you research this possibility and run into problems?

@Philippe-Cholet
Copy link
Member

I agree about not conflicting for the reason you mention, I was just being careful.
TBH, I did not think of this trait-way, I like it. After giving it a try, it works with a few additions.

pub trait OrderingOrBool<I, J> {
    type Item;
    // All with `#[inline(always)]` in both impl
    fn into_cmp(self) -> Ordering;
    fn left(left: I) -> Self::Item;
    fn right(right: J) -> Self::Item;
    fn both(left: I, right: J) -> Self::Item;
}

To match against the result of the user function, I needed into_cmp (for bool, if self { Ordering::Less } else { Ordering::Greater }).
I added "left/right/both" methods to be able to create variants ("both" for bool only contains unreachable!("into_cmp never returns Ordering::Equal so this should never be called")).

@phimuemue
Copy link
Member Author

Wow, that was fast. - Nice that it seems to work.

As for the additional methods: Of course, I didn't see the whole implementation, but it sounds reasonable.

@Philippe-Cholet
Copy link
Member

Here is a commit for it. Seems reasonable to me too.
I'm currently blocked about merging implementations of MergeJoinBy and MergeBy. Maybe you have an idea on what to do. Or I will try again.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 11, 2023

First, I moved MergeBy into its own module "merge" as you suggested.

To merge implementations of "MergeBy" and "MergeJoinBy", I mostly struggled with the type "Merge" (how to discard "MergePredicate" and "MergeLte" for I::Item::le).
Here, type Merge<I, J> = MergeBy<I, J, MergeLte>; becomes type Merge<I, J> = MergeBy<I, J, fn(&<I as Iterator>::Item, &<J as Iterator>::Item) -> bool>;.
Then trait MergePredicate<T> and struct MergeLte can be removed.
Finally, MergeBy<I, J, F> basically wraps MergeJoinBy<I, J, F, bool>.
So in impl Iterator, I use Either::into_inner to convert Either<T, T> to T.

  • "MergeJoinBy" don't have impl FusedIterator while "MergeBy" did. Not sure what to do here.
  • (MergeJoinBy<I, J, F, bool> and) MergeBy<I, J, F> lost some size_hint information because "EitherOrBoth" sometimes use one element of each iterator in one item (Both variant) while "Either" does not. Unless I find a fix.
  • Should I #[inline] methods in impl Iterator for MergeBy ?

If this is satisfying, should I merge files "merge_join.rs" into the newly created "adaptors/merge.rs" ?

EDIT: In "OrderingOrBool" trait, I added fn size_hint(left: SizeHint, right: SizeHint) -> SizeHint; to fix the small size_hint issue mentionned above.

@phimuemue
Copy link
Member Author

Thank you for staying in touch. I see there are some valid questions, so I suggest to first focus on supporting bool (since that's probably what's needed anyway if we want to merge MergeBy and MergeJoinBy, i.e. let's focus on your first commit (Philippe-Cholet@5ad6f70).

Please see my comments there for specifics, but probably most important: Is the fourth type parameter in MergeJoinBy really needed? (I would guess "no", because it could be inferred from F.)

In addition, this commit introduced several trait bounds that look unneeded to me. We should add only the trait bounds required to support bool and these that keep the code in line with what's already there.

The idea of your transformation (i.e. having left, right, both as customization points) looks correct, but the unreachable makes me think if there's a way to refactor this to get rid of it. Did you try something in this direction?

As said, let's keep it at supporting bool for now, but here's some thoughts regarding your questions anyway:

"MergeJoinBy" don't have impl FusedIterator while "MergeBy" did. Not sure what to do here.

I think we should fuse the input iterators (i.e. adapt MergeJoinBy's strategy and document this. (Sidenote: That's exactly the type of problem that would be avoided if the structs had been merged.)

(MergeJoinBy<I, J, F, bool> and) MergeBy<I, J, F> lost some size_hint information because "EitherOrBoth" sometimes use one element of each iterator in one item (Both variant) while "Either" does not. Unless I find a fix.

Good catch. Maybe that precludes merging the structs... We'll have to look into that.

Should I #[inline] methods in impl Iterator for MergeBy ?

I think most of the time we rely on the compiler, but afaik there are no definitive guidelines about [inline].

@Philippe-Cholet
Copy link
Member

You might have missed (or not) an edit of my comment, a fix about size_hint, seems simple enough to me.
Instead of creating a trait for Ordering/bool, maybe we can create a trait for FnMut(&I, &J) -> T, I'll come back to you.

@Philippe-Cholet
Copy link
Member

I did not find a way to have only three types. I was going with a trait MergePredicate (idea from how MergeBy currently works)

pub trait MergePredicate<I, J> { // or <I, J, T>
    type Item;
    // fn merge_pred(&mut self, left: &I, right: &J) -> T; // return self(left, right)
    fn merge(&mut self, left: I, right: J) -> (Option<I>, Option<J>, Self::Item);
    // (item to put back left, same for right, next_item)
    // then I can match against Ordering/bool variants, and don't need one unreachable both method.
    fn left(left: I) -> Self::Item;
    fn right(right: J) -> Self::Item;
}

With MergePredicate<I, J>, I get an error about "conflicting implementations".

impl<I, J, F: FnMut(&I, &J) -> Ordering> MergePredicate<I, J> for F { ... }
impl<I, J, F: FnMut(&I, &J) -> bool> MergePredicate<I, J> for F { ... }

I suppose it worries that there is a type out there implementing both FnMut(&I, &J) -> Ordering and FnMut(&I, &J) -> bool. Seems impossible but it's an error.
With MergePredicate<I, J, T> (T being Ordering or bool), this problem vanishes but I get another, a weird one IMO about:

impl<I, J, F, T> Iterator for MergeJoinBy<I, J, F>
    where I: Iterator,
          J: Iterator,
          F: MergePredicate<I::Item, J::Item, T>,
{ ... }

error[E0207]: the type parameter T is not constrained by the impl trait, self type, or predicates

The constraint on F does not constrain T at all apparently.

Hence I add T as a fourth type to MergeJoinBy. Only to get that it's not used (add a PhantomData<T> field leads to a compiler error in tests about "type annotation", even with constraint F: MergePredicate<I, J, T>) so with F: FnMut(&I::Item, &J::Item) -> T as a constraint on MergeJoinBy, then it works with MergePredicate<I, J, T> being a super trait of FnMut(&I, &J) -> T! Harsh right?!

So I think T should be inferred from F as you guessed but not by the current compiler or this trait way.
It solves the issue you have with unreachable. I'll make a commit after dinner for you to see.

@Philippe-Cholet
Copy link
Member

Here is a new commit (in a new branch) with trait MergePredicate<I, J, T>: FnMut(&I, &J) -> T (see previous comment). It better matches Ordering & bool variants. And merge_join_by constraints seems nicer ?
If you don't want the previous OrderingOrBool or this MergePredicate, I'm open to suggestions.
Not a priority at this moment but either way, there is some code duplication about putting back right/left elements.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 12, 2023

Following the last commit, add a PhantomData field on MergeJoinBy would allow me to remove F: FnMut(...) -> T constraint on the struct definition, impl Clone and impl Debug of MergeJoinBy, all three that were added in the last commit, and MergePredicate does not need to be a supertrait.
So the new PhantomData commit but you might prefer to see the diff if these last two were squashed: comparison here (the changes on constraints are way more natural/minimalist).

@phimuemue
Copy link
Member Author

phimuemue commented Jun 12, 2023

Thanks for the update. I like how you were able to avoid the unreachable. I would assume that the compiler is smart enough to inline fn merge and avoid the temporary Options.

I still think it should be possible without T in MergeJoinBy - see this prototype: phimuemue@072fe00. (I consider it important to not change MergeJoinBy's interface (if not strictly required), as it right now is already complex enough.)

Looking at this prototype, it seems that MergePredicate is not even needed anymore. If you want to continue working on this, you can use the prototype as a starting point, and get rid of the rough edges.

@Philippe-Cholet
Copy link
Member

So I got rid of MergePredicate there. The diff between before the issue and now, it feels right.

I could avoid code duplication (code below is in four Iterator methods), or not (it's not that bad).

if let Some(left) = left { self.left.put_back(left); }
if let Some(right) = right { self.right.put_back(right); }

I could work on bringing struct MergeBy into src/merge_join.rs or another place ?

@phimuemue
Copy link
Member Author

I'd say we should focus on supporting bool before moving stuff around, especially in light of #702.

  • merge never returns (Some(...), Some(...), ...), but the caller always checks both Options. On top of that, when we call merge, this is always followed by put_backs. As such, we could think about moving the put_back into merge - or possibly at least returning Option<Either<I::Item, J::Item>>. IMO you do not have to do this, because would I hope the compiler is smart enough to get rid of all the checks, but please add a comment that we thought about this.
  • merge_join_by should only accept T where T: OrderingOrBool.
  • Could you please add some (possibly quickcheck) tests. In particular, we should assert that the bool-variant behaves the same as the Ordering-variant as long as the Ordering is not equal. And we should assert that the bool-variant yields the same elements (modulo wrapping in Either) as the simpler MergeBy.
  • In the documentation, instead of "in the first case", let's be more explicit: "If cmp_fn returns Ordering"; similar for the other case. The bool case should probably mention that this somewhat corresponds to cmp_fn returning if the first argument is considered "less" than the second argument. I like your idea of re-using the example here - nice.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 13, 2023

Use MergeJoinBy for the internal of MergeBy to merge their implementation in a single file is quite easy anyway.

All four points are done here (and small stupid duplicates removed). Although, I'm not sure the documentation sentence about "less" is clear enough. I changed the common example to show that the Both variant does not always split into consecutives Left, Right variants (when one of the two iterators is not ascending, it's not a precondition after all).

@Philippe-Cholet
Copy link
Member

I should not have unwrapped Either in the first quickcheck (commit).

@Philippe-Cholet
Copy link
Member

I tried with put_backs in merge, and while its definition certainly would be not pretty, the rest would be straightforward (just let elem = T::merge(&mut self, left, right);).

pub trait OrderingOrBool<L, R> {    // L R for types, I J for iterators below
    type MergeResult;
    fn left(left: L) -> Self::MergeResult;
    fn right(right: R) -> Self::MergeResult;
    fn merge<I, J, F>(it: &mut MergeJoinBy<I, J, F>, left: L, right: R) -> Self::MergeResult
        where I: Iterator<Item = L>,
              J: Iterator<Item = R>,
              F: FnMut(&L, &R) -> Self;    // match/if  (it.cmp_fn)(&left, &right)  {...}
    fn size_hint(left: SizeHint, right: SizeHint) -> SizeHint;
}

@phimuemue
Copy link
Member Author

Hi, thanks so far. I realized we should keep the inners of OrderingOrBool private (so that we can change them later). We could employ a similar technique to HomogenousTuple which is a facade for TupleCollect.

@Philippe-Cholet
Copy link
Member

OrderingOrBool is still private (we can't access it from outside itertools) because in "lib.rs", the module "merge_join" is not public, and OrderingOrBool is not pub used anywhere else.
But HomogenousTuple is public (I can use itertools::traits::HomogeneousTuple; in a personal project).

@phimuemue
Copy link
Member Author

Oh, cool. Maybe open a PR so we can have a last review?

bors bot added a commit that referenced this issue Jun 16, 2023
704: `MergeJoinBy` also accept functions returning `bool` r=jswrenn a=Philippe-Cholet

Related to #701 . The produced iterator then yields `Either` items instead of `EitherOrBoth` items when the user function returns `Ordering`.

Co-authored-by: Philippe-Cholet <[email protected]>
@Philippe-Cholet
Copy link
Member

Now that the PR is merged, we can merge implementations as you first wanted. The way I see it, trait MergePredicate<T> and struct MergeLte are useless (being respectively replaced by FnMut(&T, &T) -> bool and PartialOrd::le).

pub type Merge<I, J> = MergeBy<I, J, MergeLte>;
// becomes
pub type Merge<I, J> = MergeBy<I, J, fn(&<I as Iterator>::Item, &<J as Iterator>::Item) -> bool>;

Then

pub fn merge<I, J>(i: I, j: J) -> Merge<<I as IntoIterator>::IntoIter, <J as IntoIterator>::IntoIter>
    where I: IntoIterator,
          J: IntoIterator<Item = I::Item>,
          I::Item: PartialOrd
{
    // merge_by_new(i, j, MergeLte)
    // becomes
    merge_by_new(i, j, PartialOrd::le)
}

And MergeBy uses MergeJoinBy in its internal

pub struct MergeBy<I, J, F>
    where I: Iterator,
          J: Iterator<Item = I::Item>
{
    iter: MergeJoinBy<I, J, F>,
}

The rest is really straightforward. I know it will pass tests as I experimented before.
If that seems reasonable to you then my question is where do we move all the code, in which file? Or we move it a last step? Or as a first?

@phimuemue
Copy link
Member Author

phimuemue commented Jun 17, 2023

Hi @Philippe-Cholet , I think this design has two drawbacks:

  • Using fn (note the lower case) buys some flexibility at the expense of possibly worsening runtime, as it is a function pointer. Thus, we should avoid it.
  • Having a field iter in MergeBy would require that we implement all iterator methods and manually forward them to iter. Instead, we could make MergeBy a type alias for a particular MergeJoinBy specialization.

Without having looked at the code too closely, I think we should express the affected iterators all as instances of MergeJoinBy, and customize MergeJoinBy essentially by one parameter that

  • encapsulates the predicate which decides how elements are merged/ordered
  • converts the raw iterator elements into the merged types.

This could be realized by extending OrderingOrBool.

@Philippe-Cholet
Copy link
Member

I kinda expected it about fn.
About iter, I thought about it and I didn't think it was an issue because I don't think it would make sense to implement more than next/size_hint/count/last/nth. I was more annoyed by Either::into_inner decapsulation (see below, straightforward). I'm gonna try going your way.

impl<I, J, F> Iterator for MergeBy<I, J, F>
    where I: Iterator,
          J: Iterator<Item = I::Item>,
          F: FnMut(&I::Item, &I::Item) -> bool,
{
    type Item = I::Item;
    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().map(Either::into_inner)
    }
    fn size_hint(&self) -> SizeHint {
        self.iter.size_hint()
    }
    fn count(self) -> usize {
        self.iter.count()
    }
    fn last(self) -> Option<Self::Item> {
        self.iter.last().map(Either::into_inner)
    }
    fn nth(&mut self, n: usize) -> Option<Self::Item> {
        self.iter.nth(n).map(Either::into_inner)
    }
}

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 20, 2023

After moving Merge/MergeBy into merge_join module to manipulate them all easily, I'm just really stuck.

To be able to keep MergeLte and not use PartialOrd::le (and therefore not a function pointer in Merge type), we can't keep F: FnMut(&I::Item, &J::Item) -> T in impl Iterator for MergeJoinBy. But use something like F: MergePredicate<I::Item, J::Item, T> instead does not constrain T anymore leading to introduce T back in MergeJoinBy<I, J, F, T> with a phantom field. I don't see a way around this.

Both MergeBy and MergeJoinBy use functions returning bool so in order for MergeBy to be an instance of MergeJoinBy, the only way I saw was adding another parameter (const SAME: bool (meaning L == R ?), a const one hence breaking the MSRV) to distinguish them. You can see the changes here. I'll revert them if you can't salvage any of this.

@Philippe-Cholet
Copy link
Member

About F: MergePredicate<I::Item, J::Item, T> not contraining T, the answer was F: MergePredicate<I::Item, J::Item, Out = T>. Then no phantom field. Commit here.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 21, 2023

I think I'm finally on the right path!

To get rid of const SAME: bool, I wrap the function F in MergeFuncLR (result wrapped EitherOrBoth/Either<L,R>) and MergeFuncT (result unwrapped T).
Those MergeFuncLR/T have a parameter T (phantom field) to not have conflicting implementations of MergePredicate<L, R>.

In merge_join_by definition, in order for the compiler to keep guessing left/right types, I kept F: FnMut(&Self::Item, &J::Item) -> T. But to not add F: MergePredicate<Self::Item, J::Item> that felt duplicate, I removed T: OrderingOrBool..., the user might lose OrderingOrBool information.

Commit there but you might prefer to see the current code.

@phimuemue
Copy link
Member Author

Hi there, thanks for going through this. I do not have too much time right now, so just quick feedback:

I think this is indeed taking the right turns! I skimmed the code, and I saw that you have four merge predicates:

  • merge -> MergeJoinBy<..., MergeFuncT<MergeLte, bool>>
  • merge_by -> MergeJoinBy<..., MergeFuncT<F, bool>>
  • merge_join_by(...bool) -> MergeJoinBy<..., MergeFuncLR<F, bool>>
  • merge_join_by(...Ordering) -> MergeJoinBy<..., MergeFuncLR<F, Ordering>>

Looks simple and good. Maybe we can simplify some MergeFuncT/MergeFuncLR internals (does MergeFuncT need its second parameter?), but the scaffold so far seems right.

I think that MergeJoinBy it should impl FusedIterator, shouldn't it?

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 21, 2023

Thanks for the quick feedback.

I only commented out FusedIterator temporarily, I definitely think MergeJoinBy should implement it (commit) like many other iterators. After we solve the current issue, I'll probably make an issue (or use an old one if any) as I investigated it.

Good catch, MergeFuncT<F, bool> does not need the second (phantom) parameter since T always is bool so MergeFuncT<F> is enough. Therefore new commit in which I also simplify MergeFuncLR and MergeFuncT to "tuple structs".

Note 1: But MergeFuncLR needs it to avoid conflicting implementations of MergePredicate<L, R>.

impl<L, R, F: FnMut(&L, &R) -> Ordering> MergePredicate<L, R> for MergeFuncLR<F/*, Ordering*/> { ... }
impl<L, R, F: FnMut(&L, &R) -> bool> MergePredicate<L, R> for MergeFuncLR<F/*, bool*/> { ... }

Some impossible F could implement both FnMut(&L, &R) -> Ordering and FnMut(&L, &R) -> bool. Annoying but unavoidable IMO.

Note 2: impl<...> MergePredicate<T, T> for MergeFuncT<MergeLte> is mostly copy/paste of impl<...> MergePredicate<T, T> for MergeFuncT<F> with self.0(&left, &right) replaced by left <= right.

EDIT: I could write a new fn merge_join_by_new<I: IntoIterator, J: IntoIterator, F>(left: I, right: J, f: F) -> MergeJoinBy<I::IntoIter, J::IntoIter, F> to avoid MergeJoinBy { left: put_back(left.into_iter().fuse()), right: put_back(right.into_iter().fuse()), cmp_fn: ..., } three times when only cmp_fn changes.

@Philippe-Cholet
Copy link
Member

From its trait documentation, we should implement FusedIterator for all iterators that always return None once they've done it once. But the real question IMO is on which conditions?!

Here, I think I made a mistake (being too fast). MergeJoinBy fuses both left and right iterators even if they were not before. So I think we should

impl<I, J, F, T> FusedIterator for MergeJoinBy<I, J, F>
    where I: Iterator,
          J: Iterator,
          F: MergePredicate<I::Item, J::Item, Out = T>,
{}

instead of

impl<I, J, F, T> FusedIterator for MergeJoinBy<I, J, F>
    where I: FusedIterator,
          // ~~~~~
          J: FusedIterator,
          // ~~~~~
          F: MergePredicate<I::Item, J::Item, Out = T>,
{}

@phimuemue
Copy link
Member Author

Sounds right.

@Philippe-Cholet
Copy link
Member

FusedIterator' small fix.
Currently: lib.rs, merge_join.rs, big diff.


In merge_join_by definition, in order for the compiler to keep guessing left/right types, I kept F: FnMut(&Self::Item, &J::Item) -> T. But to not add F: MergePredicate<Self::Item, J::Item> that felt duplicate, I removed T: OrderingOrBool..., the user might lose "OrderingOrBool information".

I could

pub trait IsOrderingOrBool {}
impl IsOrderingOrBool for Ordering {}
impl IsOrderingOrBool for bool {}
// then add to the function definition
    T: merge_join::IsOrderingOrBool,

I think that would be clearer to people. What's your thought on that?

Then should I open a pull request for review, or do you see changes to do?

@Philippe-Cholet
Copy link
Member

@phimuemue Little reminder.

@phimuemue
Copy link
Member Author

Hi, thanks for keeping in touch.

What I see:

  • Could we keep the return type of fn merge_join_by to be MergeJoinBy<Self, J::IntoIter, F>, and make MergeJoinBy a type alias for some InternalMergeJoinBy? (I'd like to avoid our auxiliary structs (such as MergeFunc) appearing in public signatures.)
  • I really like that you just laid out the four MergePredicate variants (instead of introducing another level of indirection for the Merge/MergeBy variants).
  • Always implementing FusedIterator is correct imho.

If all this passes our tests, can you open a PR?

@Philippe-Cholet
Copy link
Member

Thank you too for this long discussion, I initially thought it would be quickier.

I don't like MergeFuncLR to be public either, so I thought of this InternalMergeJoinBy but what it means for MergeJoinBy is a fourth type (pub type MergeJoinBy<I, J, F, T> = InternalMergeJoinBy<I, J, MergeFuncLR<F, T>>;) which we previously tried to avoid. Because we can't write something like <F as FnMut(...)>::Output to infer T from F (from what I understand because Fn* traits aren't stabilized).

I don't see any way to avoid it, but maybe you do. Currently, Philippe-Cholet/itertools@c622606 . I'll make a PR once you find a way or are ok with the 4th type.

@phimuemue
Copy link
Member Author

I think I'm fine with InternalMergeJoinBy - I agree it's a bit ugly, but the ugliness is not leaked too much to the user. And I hope that - once we find something better - we can clean up the internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants