RFC: introduce `split_at(mid: usize)` on `str` #1123

bluss · 2015-05-17T15:22:06Z

Introduce the method split_at(&self, mid: usize) -> (&str, &str) on str,
to divide a slice into two, just like we can with [T].

Rendered version

Adding split_at is a measure to provide a method from [T] in a version that
makes sense for str.

Once used to [T], users might even expect that split_at is present on str.

/// Divide one string slice into two at an index.
///
/// The index `mid` is a byte offset from the start of the string
/// that must be on a character boundary.
///
/// Return slices `&self[..mid]` and `&self[mid..]`.
///
/// # Panics
///
/// Panics if `mid` is beyond the last character of the string,
/// or if it is not on a character boundary.
///
/// # Examples
/// ```
/// let s = "Löwe 老虎 Léopard";
/// let first_space = s.find(' ').unwrap_or(s.len());
/// let (a, b) = s.split_at(first_space);
///
/// assert_eq!(a, "Löwe");
/// assert_eq!(b, " 老虎 Léopard");
/// ```
fn split_at(&self, mid: usize) -> (&str, &str) {
    (&self[..mid], &self[mid..])
}

bluss · 2015-05-17T15:22:55Z

cc @Manishearth you requested something like this in rust-lang/rust#18063.

Manishearth · 2015-05-17T15:26:06Z

👍 😄

[Edit: I'm wrong here]
The current implementation could be improved to only have a single pass through the string (this iterates twice because each indexing operation is one loop)

hanna-kruppe · 2015-05-17T15:27:16Z

@Manishearth It doesn't loop at all, it uses byte indices and therefore each slice operation is constant time (just check that the index is in bounds and the byte is the start of a UTF-8 code point, then adjust the start/length of the slice).

Manishearth · 2015-05-17T15:29:03Z

[Edit: I'm wrong here]
Strings are utf8, byte indexing would make this unsafe.

String indexing indexes by chars IIRC, and for that you need to loop through because chars have variable length.

hanna-kruppe · 2015-05-17T15:30:23Z

String indexing is by bytes and checks that the index lies on a codepoint boundary (which is cheap, just check that the byte is less than ox80). This is consistent with almost all other methods, including char_at.

Manishearth · 2015-05-17T15:31:01Z

Ah, okay.

bluss · 2015-05-17T15:39:15Z

The method has only a single index that needs to be bounds checked and checked for character boundary (mid), so if the redundant checks are not elided already, split_at can be optimized to fix that. The implementation in the RFC is of course only for illustration.

alexcrichton · 2015-05-18T19:52:57Z

This sounds like a great idea to me, thanks for writing this up @bluss!

retep998 · 2015-05-19T23:28:08Z

Since this doesn't return Option, that means this code would panic! if you tried to use an index in the middle of a utf-8 sequence. Make sure you add that to the drawbacks.

Gankra · 2015-05-20T00:28:31Z

I am of the belief that now that the trains are running, this kind of change no longer ~~merits~~ requires any kind of RFC. Am I mistaken?

seanmonstar · 2015-05-20T00:34:43Z

I would expect adding any new public API to require an RFC. Twiddling the internals is just implementation that's hidden, but something that will eventually be marked #[stable] needs an RFC.

bluss · 2015-05-20T01:15:01Z

@retep998 Done!

@seanmonstar This was what I thought was the rule, and I agree with you.

pythonesque · 2015-05-20T09:12:15Z

Is there any chance split_at for strings could be modified to return an Option instead? It's actually pretty common for me to want a non-panicking version of this for slices and I definitely have a use case for it in parsers. I realize it would be inconsistent with the [T] version, but I've never really understood why that one doesn't return an Option.

bluss · 2015-05-20T10:31:32Z

The panicking logic is the same as for slicing. You should only ever use it with a byte offset you got from somewhere, so you know it's always correct (for example from find like in the example). So we panic because it's a programmer's error (contract violation) to specify an illegal offset.

I absolutely don't want to introduce an inconsistency. A part of the motivation here is that you expect .split_at to be available as it is on slices. If we go the option route, it needs a new name.

Contract violations @ Error conventions RFC

pythonesque · 2015-05-20T13:31:46Z

You should only ever use it with a byte offset you got from somewhere, so you know it's always correct (for example from find like in the example).

I don't think it's necessarily a contract violation (because strings and vectors can grow and shrink) but yeah, maybe I just want an additional checked version (but I thought our new policy on this was that we shouldn't have both a panicking and non-panicking version of an API?).

bluss · 2015-05-20T13:45:22Z

@pythonesque

Since we don't have any optional-value slicing, I think we should add .get_slice<R: RangeArgument>(range: R) -> Option<&str> first.

Additionally or alternatively, we should stabilize is_char_boundary so that it is possible to check indices for slicing and split_at up front. I think that's in line with our guidelines. Maybe the function could be renamed to something like is_valid_index ?

Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str).

alexcrichton · 2015-06-02T23:22:20Z

This RFC is now entering the final week-long final comment period.

The library subteam may not require an RFC for a change such as this in the future, but we have yet to concretely decided on the guidelines for what needs an RFC (for discussion see this thread). While we develop this policy, however, we're going to go ahead with this RFC.

Gankra · 2015-06-02T23:40:10Z

This seems fine to add. No strong feelings, though.

huonw · 2015-06-03T00:33:18Z

My only comment is that I question the unicode hygiene of this function (i.e. naive calls to this probably won't be unicode-correct?). However, this is consistent with [T], no worse than things we already expose, definitely clearer than double-..-slicing and anyway we generally don't want to stop people doing things just because there's the risk of misuse. On balance, I'm happy to see this land.

hanna-kruppe · 2015-06-03T06:22:56Z

What do you mean by unicode-correct? Like all other slicing and indexing, it respects code point boundaries, i.e., it will rather panic than cutting a code point in half. It won't try to make you deal with grapheme clusters or anything, but none of the existing methods do that (except for graphemes and grapheme_indices, obviously).

huonw · 2015-06-03T06:43:07Z

I mean that it is very easy to split a string to give nonsense output, e.g. at the most basic level, splitting the two-codepoint version of ä between a and the combining character, and, it's not immediately obvious to me if performing this operation is useful/can be correct for arbitrary unicode strings. (Don't get me wrong, it can definitely be correct and useful for restricted subsets, e.g. ASCII-only strings, but an arbitrary chunk of text can be very arbitrary.)

hanna-kruppe · 2015-06-03T06:57:22Z

But all other string manipulation¹ already has that problem. It's a long-standing policy to not try and prevent that sort of error. In addition, it's not like this API can't be used "properly" --- you just have to select the index at which you split properly. Which, again, is exactly the same situations as with the rest of the string API.

¹With the aforementioned exception, which BTW moved out of libstd now.

huonw · 2015-06-03T10:05:46Z

Yes, I covered those aspects in my comment. It's why I'm not against this landing. :)

alexcrichton · 2015-06-09T23:51:06Z

This consensus of the libs subteam is that we should merge this RFC, so I'm going to merge it. Thanks again for the RFC @bluss!

Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str).

Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str). Also a minor cleanup in the collections::str module. Remove redundant slicing of self.

[str-split-at] RFC: introduce split_at(mid: usize) on str

622e252

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 18, 2015

[str-split-at] Update drawbacks

8f8f7ea

bluss pushed a commit to bluss/rust that referenced this pull request May 27, 2015

Add str::split_at

ac85082

Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str).

bluss mentioned this pull request May 27, 2015

Implement str::split_at (RFC 1123) rust-lang/rust#25839

Merged

alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jun 2, 2015

Gankra self-assigned this Jun 5, 2015

alexcrichton merged commit 8f8f7ea into rust-lang:master Jun 9, 2015

bluss deleted the str-split-at branch June 10, 2015 00:35

bluss pushed a commit to bluss/rust that referenced this pull request Jun 10, 2015

Add str::split_at

d43bf53

Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str).

chriskrycho mentioned this pull request Feb 8, 2017

Document all features in the reference rust-lang/rust#38643

Closed

17 tasks

chriskrycho mentioned this pull request Mar 11, 2017

Document all features rust-lang/reference#9

Closed

48 tasks

Centril added the A-slice Slice related proposals & ideas label Nov 23, 2018

Centril unassigned Gankra Nov 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: introduce `split_at(mid: usize)` on `str` #1123

RFC: introduce `split_at(mid: usize)` on `str` #1123

bluss commented May 17, 2015

bluss commented May 17, 2015

Manishearth commented May 17, 2015

hanna-kruppe commented May 17, 2015

Manishearth commented May 17, 2015

hanna-kruppe commented May 17, 2015

Manishearth commented May 17, 2015

bluss commented May 17, 2015

alexcrichton commented May 18, 2015

retep998 commented May 19, 2015

Gankra commented May 20, 2015

seanmonstar commented May 20, 2015

bluss commented May 20, 2015

pythonesque commented May 20, 2015

bluss commented May 20, 2015

pythonesque commented May 20, 2015

bluss commented May 20, 2015

alexcrichton commented Jun 2, 2015

Gankra commented Jun 2, 2015

huonw commented Jun 3, 2015

hanna-kruppe commented Jun 3, 2015

huonw commented Jun 3, 2015

hanna-kruppe commented Jun 3, 2015

huonw commented Jun 3, 2015

alexcrichton commented Jun 9, 2015

RFC: introduce split_at(mid: usize) on str #1123

RFC: introduce split_at(mid: usize) on str #1123

Conversation

bluss commented May 17, 2015

bluss commented May 17, 2015

Manishearth commented May 17, 2015

hanna-kruppe commented May 17, 2015

Manishearth commented May 17, 2015

hanna-kruppe commented May 17, 2015

Manishearth commented May 17, 2015

bluss commented May 17, 2015

alexcrichton commented May 18, 2015

retep998 commented May 19, 2015

Gankra commented May 20, 2015

seanmonstar commented May 20, 2015

bluss commented May 20, 2015

pythonesque commented May 20, 2015

bluss commented May 20, 2015

pythonesque commented May 20, 2015

bluss commented May 20, 2015

alexcrichton commented Jun 2, 2015

Gankra commented Jun 2, 2015

huonw commented Jun 3, 2015

hanna-kruppe commented Jun 3, 2015

huonw commented Jun 3, 2015

hanna-kruppe commented Jun 3, 2015

huonw commented Jun 3, 2015

alexcrichton commented Jun 9, 2015

RFC: introduce `split_at(mid: usize)` on `str` #1123

RFC: introduce `split_at(mid: usize)` on `str` #1123