-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for slice::partition_dedup/by/by_key
#54279
Comments
This issue tracks: impl<T> [T] {
pub fn partition_dedup(&mut self) -> (&mut [T], &mut [T])
where
T: PartialEq,
{
self.partition_dedup_by(|a, b| a == b)
}
pub fn partition_dedup_by<F>(&mut self, mut same_bucket: F) -> (&mut [T], &mut [T])
where
F: FnMut(&mut T, &mut T) -> bool,
{…}
pub fn partition_dedup_by_key<K, F>(&mut self, mut key: F) -> (&mut [T], &mut [T])
where
F: FnMut(&mut T) -> K,
K: PartialEq,
{
self.partition_dedup_by(|a, b| key(a) == key(b))
}
} The doc-comment of
From the implementation PR:
|
These methods feel very niche to me (given we already have the @rust-lang/libs, any thoughts? |
Note that |
@leonardo-m brought up the question of whether the second element of the returned tuple is even used. If not, it would simplify the API to return only one slice instead of a tuple. So it would be really useful to know how people use this feature. If it's merely used as a way to But as I agree this is pretty niche, I think it's fine if the API is a bit more complicated. Niche APIs have a stronger focus on being powerful than being easy to use, I think. My initial thought also was that this API is too niche and shouldn't be included, but apparently it is used a lot (even excluding Rust forks, there are quite a few projects using it on the first few pages of the search results). So I'd be fine with stabilizing this as soon as the return type question is resolved. |
As far as I can tell if the method returns one slice the other behavior can still be achieved by only keeping the length of that returned slice and then using |
I thought that it would be great to also add the |
The link above gives ~1500 results. Excluding the word "slice_internals" and "fmt_internals" and forcing the language to be Rust leaves only 12 results. So 99% of this "lot" are very likely just rustc clones. A more accurate search would be: https://github.com/search?l=Rust&q=partition_dedup+NOT+slice_internals+NOT+fmt_internals+NOT+test_binary_search_implementation_details&type=Code (12 results) Only 3 examples actually try to use the second element:
Conclusion: The second element is extremely niche, and encourages bad algorithm. As mentioned in #54279 (comment), the original slice is not destroyed, so the two-element version can be easily reimplemented by the one-element version using fn partition_dedup(&mut self) -> (&mut Self, &mut Self) {
let p = self.dedup_in_place().len();
self.split_at_mut(p)
} There is no reason to keep the second element from the return type. |
@kennytm Wow, thanks for the research! I clearly didn't put enough time into my comment and was indeed fooled by my search. In case we change the return type to only return the first part of the slice, maybe we should also think about renaming the methods. Shepmaster suggested the current name partially because of the 2-tuple return type. I don't mind the current name (the algorithm still does some kind of partitioning), but another name might be more appropriate: the return type is different than |
Note: The side effect of this method is that the input slice is swapped all around on success so that the initial span of the spice is the deduped portion. Thus, must_use would be inappropriate. |
Oh hey I've been mentioned. Yeah that use case is very much artificial, it was part of the Advent of Code, a series of puzzles. The puzzle for day 10 can be found at https://adventofcode.com/2019/day/10, although you'd need to solve part 1 before it reveals part 2 which this was part of. However as you said i could easily just use |
I would be inclined to remove these methods. This is an uncommon use case. The algorithm can be implemented in ~20 lines of safe code if you don't mind the bounds checks. I think the amount of benefit we'd deliver by stabilizing these is much less than what it costs to put these fairly complicated names and signatures in front of everyone in the documentation of slices. |
I'd vote against removing these methods. Even if their use is uncommon they are genuinely useful. They also fit with other |
As I said, those are implemented in terms of these, so these cannot be removed entirely. Users of Vec already see this complexity, in Stable, there's no reason that it can't be available to slices too. |
The ones on Vec are simpler signatures, simpler names, and simpler behavior to understand, and serve a somewhat more common use case. So I don't find that those necessarily provide justification for the new slice methods. Regarding "cannot be removed entirely": they can be removed entirely from the API of the standard library. |
Are you talking about only the "complicated" version: fn partition_dedup(&mut self) -> (&mut [T], &mut [T]) or also the "simplified" version fn partition_dedup(&mut self) -> &mut [T] |
I would like to see them put on normal slices so that other crates with vec-like APIs can simply use the slice versions. But i'm fine with the simple version, since that's what Vec offers. |
If the intended audience is crates that provide a vec-like API, I really don't think that justifies adding these to slice. It's an uncommon use case and they can write the 20 lines themselves. I don't think providing these methods makes a dent in the actual hard parts of designing and implementing a vec-like type downstream. |
Or just anyone using slices who wants to be able to do this. Unless your argument is that adding this in the first place to Vec was itself a mistake, then the same methods should be on slices too (the simplified versions that is). |
Ignore me, I failed to read the documentation properly:
Indeed, the relative order of the duplicates is not preserved, and I just got lucky on the AoC problem :( |
Hey @SimonSapin, Do you think that if I update the function name to be |
I think //this is ok to do with vec:
let vec = vec![1, 1, 2, 3];
vec.dedup()
// can use vec, no problem
// same case with slices
let mut slice = [1, 1, 2, 3];
slice.dedup_in_place();
// uses slice, oops
// or worst:
let mut slice = [1, 1, 2, 3];
let new = slice.dedup_in_place();
// one may use slice, ignore new and refer to “removed” elements returning a tuple will prevent the later, making it |
You can't ignore |
first case holds |
which suggests and returning a tuple prevents none of the issues raised in your sample code anyway. |
Fair enough. I am just concerned with the potential misuse of this function, but |
One of my projects could definitely profit from having these functions stabilized.
For my personal use case I could use the second returned value but infering all information I need can also be done by only the first without overhead.
As said earlier in this discussion I do not see a value in putting I see a clear value in having those definitions available since they are 20 lines of code that I do not have to write myself and that I can simply trust in my code base to be correctly implemented. If I were to write them myself I would maybe even end up with an |
If we choice to only return the dedup part, I believe we should rename |
So, I was very confused that this method is the only one with "partition" in the name for slice when it only does deduplication, only to find that the actual partition method is called The naming of these methods is a bit confusing, and I would be in favour of removing the "partition" terminology to help avoid that confusion. |
Wanted to chime in and say that, for my purposes, having it return two vectors has been very useful (both the deduplicated result, and the items that were removed). If the opinion goes in the opposite direction (to only return the deduplicated result, not the removed duplicates), how would I go about getting a vector of removed duplicates? Sorry if that's a silly question. |
@Kerollmops , it seems I found bug. I usually use I. e. my code usually looks like this: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=ee5704261d8bcb084b30e8a7de2b28ff . So far, so good. But if I change type of https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=fee945c939074d4d1373f3a52e1b2c8c But surprisingly So, it seems that wrapper |
@safinaskar it's by design not a bug, are you asking these method should return a reference like PS: |
A bit of a pedantic observation, but the That would still not solve the "bug" of course, and I don't think that bug can really be solved. Sure, you could make the signature |
Let me say another way. In my code I often want to dedup vectors of structs, which have Unfortunately the only way I can do this is to So I want some API that works in this case without cloning. Or at least which give ability to verify |
#![feature(slice_partition_dedup)]
fn main() {
struct Foo {
f: String,
}
let mut a: Vec<Foo> = vec![];
a.partition_dedup_by(|a, b| a.f == b.f);
} |
@Stargateur , I don't like this solution. I'm big fan of "don't repeat yourself" principle. But, yes, this will go as pragmatic solution |
The issue with the |
What is the status of this feature? |
Add three methods to the
slice
type, thepartition_dedup
,partition_dedup_by
andpartition_dedup_by_key
. The two other methods are based onslice::partition_dedup_by
.Steps / History
Unresolved Questions
#[must_use]
?The text was updated successfully, but these errors were encountered: