Trim methods on slices #2547

SoniEx2 · 2018-09-22T22:56:44Z

/// Trims this slice from the left.
fn trim_left_matches<F: Fn(T) -> bool>(&self, f: F) -> &[T] {
    let mut res = self;
    while res.len() > 0 && f(res[0]) {
        res = res[1..];
    }
    res
}

/// Trims this slice from the right.
fn trim_right_matches<F: Fn(T) -> bool>(&self, f: F) -> &[T] {
    let mut res = self;
    while res.len() > 0 && f(res[res.len()-1]) {
        res = res[..(res.len()-1)];
    }
    res
}

(and so on)

basically turns &["", "", "", "foo", ""] into &["foo", ""], &["", "foo", "", "", ""] into &["", "foo"], etc, depending on what you call.

The text was updated successfully, but these errors were encountered:

burdges · 2018-09-22T23:41:37Z

These sounds like noise since you implement them trivially with code like s.split_at_mut(s.len() - s.iter().rev().filter(|x| x.len()==0).count()).0.

SoniEx2 · 2018-09-23T02:01:31Z

That's even less readable. :/

Noise is having the same code snippet over and over again and not having it in a well-documented standalone function.

Also, I don't think that code of yours actually works. You probably meant to use take_while and split_at.

Lonami · 2018-10-09T07:08:01Z

We have .skip_while() and .take_while() for iterators. Aren't those enough?

.iter().skip_while(|x| x == "").take_while(|x| x != "")

SoniEx2 · 2018-10-09T10:53:51Z

No - that doesn't work like a trim method.

And they're not analogous to the string methods.

Aloso · 2018-11-30T06:40:53Z

@SoniEx2 Can you give an example where this would be useful (except for Strings)?

SoniEx2 · 2018-11-30T12:52:24Z

When you have a slice and don't want to allocate.

josephlr · 2020-05-05T07:18:31Z

I ended up needing something like this for c-string parsing. I have a sequence of bytes and want to return the prefix containing the c-string data (not including the null terminator).

But then I realized, you can use split to do this:

fn trim_c_string(s: &[u8]) -> &[u8] {
    s.split(|&b| b == 0).next().unwrap_or(&[])
}

However, this implementation cannot eliminate the bounds check unlike the naive loop implementation:

pub fn fast_trim_c_string(s: &[u8]) -> &[u8] {
    for i in 0..s.len() {
        if s[i] == 0 {
            return s.split_at(i).0;
        }
    }
    s
}

serid · 2020-05-14T10:54:57Z

It's nice to have trim methods on str but in the project I am working on right now, I use &[char] slices instead of &str, because I need indexed access to characters and slicing of strings which &str does not support since it's UTF-8. It is disturbing that str has a .trim() method and a generic [T] slice does not. Would be really nice if this issue was resolved, all the more so it is that easy to implement.

A sample implementation looks like this though I am sure it is suboptimal.

fn trim<P>(&self, mut predicate: P) -> &[T]
where
    P: FnMut(&T) -> bool,
{
    let mut left = 0;
    let mut right = self.len();

    let mut iter = self.iter();

    while let Some(e) = iter.next() {
        if predicate(e) {
            left += 1
        } else {
            break;
        }
    }

    while let Some(e) = iter.next_back() {
        if predicate(e) {
            right -= 1
        } else {
            break;
        }
    }

    &self[left..right]
}

burdges · 2020-05-14T11:38:48Z

We prefer split_* methods for slices, so as to retain access to underlying subslices, so I still think trim_* methods add noise. We could discuss some split_change(f) that does split_inclusive(|x| changed(f(x))) where

let mut previous = true;
let changed = |x| if previous == x { false } else { previous=x; true };

so trim is split_change(f).skip(1).next().unwrap_or(&[]). We're maybe better off adding roughly this changed state machine somewhere like core::iter though, not sure.

SoniEx2 · 2020-05-14T14:44:21Z

perhaps a more useful trim would use Default::default() to remove things.

@serid you should really be using &[&str] instead of &[char] because &[char] is useless.

golddranks · 2020-05-14T15:06:59Z

For information, there is a new-ish unstable API split_inclusive on slices that would help implementing such a thing. (The normal split API doesn't include the "split marker" in either of resulting sub slices. More info here: rust-lang/rust#67330) However, I neglected making a tracking issue, so there isn't a direct path toward stabilization at the moment. I'll try to scrape some time to create a tracking issue the next weekend!

golddranks · 2020-05-19T19:27:08Z

rust-lang/rust#72360 Tracking issue created.

Lucretiel · 2020-10-31T05:08:31Z

I have a use case- I'm currently working on updates to BufWriter, and specifically its implementation of write_vectored, as a part of rust-lang/rust#78551. write_vectored takes an &[IoSlice], and it'd be very useful to be able to trim empty slices from both ends. This would allow me to forward the trimmed list of slices to the inner write_vectored method, and also to specifically specialize the case where we received exactly 1 non-empty slice. These cases aren't served by iterator methods, because I need to transform slices into smaller slices to process & forward as necessary.

…iplett core: Implement ASCII trim functions on byte slices Hi `@rust-lang/libs!` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ``@rust-lang/libs!`` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ```@rust-lang/libs!``` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ````@rust-lang/libs!```` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi `````@rust-lang/libs!````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ``````@rust-lang/libs!`````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ```````@rust-lang/libs!``````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ````````@rust-lang/libs!```````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

Kage-Yami · 2022-06-19T07:26:40Z

Perhaps a different use-case for str::trim_*-alike methods for &[u8]... in my case, I have files (out of my control) that are mostly-ASCII-serialized structs from some unknown language/library which I'm parsing with nom, and I'm trimming leading whitespace from the input at every step as an easy way to ignore insignificant whitespace.

However, these files are only mostly ASCII - in some "fields", they contain straight binary, so I can't treat the entire file I'm reading as valid UTF-8 (this being the reason for using &[u8] over &str).

That being said, I did find rust-lang/rust#94035 - which is for the same as this, just restricted to ASCII specifically. In my case, that would be good enough. These methods are currently available in nightly: https://doc.rust-lang.org/std/primitive.slice.html#method.trim_ascii

Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 15, 2018

dbrgn mentioned this issue Feb 6, 2022

core: Implement ASCII trim functions on byte slices rust-lang/rust#93686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trim methods on slices #2547

Trim methods on slices #2547

SoniEx2 commented Sep 22, 2018 •

edited

Loading

burdges commented Sep 22, 2018 •

edited

Loading

SoniEx2 commented Sep 23, 2018 •

edited

Loading

Lonami commented Oct 9, 2018

SoniEx2 commented Oct 9, 2018

Aloso commented Nov 30, 2018

SoniEx2 commented Nov 30, 2018

josephlr commented May 5, 2020 •

edited

Loading

serid commented May 14, 2020 •

edited

Loading

burdges commented May 14, 2020 •

edited

Loading

SoniEx2 commented May 14, 2020

golddranks commented May 14, 2020 •

edited

Loading

golddranks commented May 19, 2020

Lucretiel commented Oct 31, 2020

Kage-Yami commented Jun 19, 2022 •

edited

Loading

Trim methods on slices #2547

Trim methods on slices #2547

Comments

SoniEx2 commented Sep 22, 2018 • edited Loading

burdges commented Sep 22, 2018 • edited Loading

SoniEx2 commented Sep 23, 2018 • edited Loading

Lonami commented Oct 9, 2018

SoniEx2 commented Oct 9, 2018

Aloso commented Nov 30, 2018

SoniEx2 commented Nov 30, 2018

josephlr commented May 5, 2020 • edited Loading

serid commented May 14, 2020 • edited Loading

burdges commented May 14, 2020 • edited Loading

SoniEx2 commented May 14, 2020

golddranks commented May 14, 2020 • edited Loading

golddranks commented May 19, 2020

Lucretiel commented Oct 31, 2020

Kage-Yami commented Jun 19, 2022 • edited Loading

SoniEx2 commented Sep 22, 2018 •

edited

Loading

burdges commented Sep 22, 2018 •

edited

Loading

SoniEx2 commented Sep 23, 2018 •

edited

Loading

josephlr commented May 5, 2020 •

edited

Loading

serid commented May 14, 2020 •

edited

Loading

burdges commented May 14, 2020 •

edited

Loading

golddranks commented May 14, 2020 •

edited

Loading

Kage-Yami commented Jun 19, 2022 •

edited

Loading