Add starts_with and ends_with to OsStr #26499

Stebalien · 2015-06-22T12:32:12Z

(also inline all non-generic public APIs)

Note: this commit doesn't use the Pattern API because it's &str specific.
Also Note: this commit doesn't introduce OsStr::contains because I'm lazy.

I'd be happy to write an RFC if you want to explore other possibilities like
expanding the Pattern API (although I really don't think we should).

rust-highfive · 2015-06-22T12:32:25Z

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

nikomatsakis · 2015-06-23T12:17:28Z

cc @rust-lang/libs

nikomatsakis · 2015-06-29T13:19:41Z

So this code seems good to me, but i'd prefer to have someone from @rust-lang/libs land it. I'm going to switch to r? @alexcrichton

alexcrichton · 2015-06-29T17:15:10Z

Can this hold off on adding #[inline] for now? Right now it tends to incur a little too much compile-time overhead for not enough runtime win. For example I've never seen OsStr manipulation at the top of any profile.

Other than that I think that this is fine, the only question being about how we want to pursue these methods into the future. Do we want to duplicate the API surface area of strings onto OS strings immediately, or incrementally? What we have today is kinda the "bare minimum" to get by with the plan to "expand if necessary". I'm somewhat against adding apis incrementally over time as it's bound to just be surprising that OsStr doesn't implement a method or two, so it may be worth taking time to plan out what the final API of OS strings might look like in the long run.

That being said I do think it's fine for these to enter in an unstable fashion, for now. If it turns out that everyone really wants these two methods then this is as good a way as any to test the waters.

bluss · 2015-06-29T18:24:52Z

src/libstd/ffi/os_str.rs

+    /// Returns true if the `other` is a prefix of the `OsStr`.
+    #[unstable(feature = "os_str_compare", reason = "recently added")]
+    pub fn starts_with<S: AsRef<OsStr>>(&self, other: S) -> bool {
+        self.bytes().starts_with(other.as_ref().bytes())


This is correct for plain bytes and UTF-8, but what about other encodings? If OsStr is WTF-8, does that allow false positives (say for example that a single byte encoded sequence for a codepoint exists that's a prefix of a multibyte sequence. UTF-8 doesn't have this).

@SimonSapin? I think these should be fine. If I'm reading the "spec" correctly, WTF-8 is a strict subset of "generalized UTF-8" (https://simonsapin.github.io/wtf-8/#generalized-utf_8) and, from what I can tell, generalized UTF-8 is a prefix-free code.

Yep, ok that was simple.

Yes, WTF-8 preserves the nice properties of UTF-8 like self-synchronization.

Just a note for people from the future: No this is not fine because if self contains a non-BMP code point (4-byte sequence), and other contains just a high-surrogate (3-byte sequence), this implementation will always return false, contradicting the result when given two WTF-16 sequences.

Stebalien · 2015-06-29T20:18:01Z

@alexcrichton I started with starts_with/ends_with because I wanted to know if a filename started with '.' and realized that the only reliable way to do this was to use to_os_string_lossy().starts_with('.').

Without distinguishing between utf8 and wtf8, I can implement the following:

starts_with/ends_with (done)
contains (done)
is_empty/len (done)
lines/lines_any (in-progress)

Everything else depends on whether we're dealing with WTF-8 or UTF-8. I can write up an RFC if you want.

bluss · 2015-06-30T08:22:45Z

@Stebalien Since you mention UTF-8 and WTF-8, on linux there is no encoding for paths in general, they are just arbitrary byte strings.

Stebalien · 2015-06-30T13:58:52Z

@bluss good point. That rules out methods like split replace, and trim (methods that look at individual characters) because they will behave differently on unix/windows. For example, "OsStr::new("Ünіcôdé").split(OsStr::new("")) act like bytes() on unix and chars() on windows.

find and the slicing methods should still be doable however.

Stebalien · 2015-06-30T23:31:09Z

I'm going to close this for now and write an RFC.

alexcrichton · 2015-07-01T05:20:30Z

@Stebalien we actually discussed this today in some libs triage, and we ended up reaching the same conclusion! After a quick overview of the string/osstr apis, we concluded that the "end goal" for what OsStr exposes may end up just being the Pattern methods on strings. Most other methods didn't seem to apply, and the pattern ones all generally seemed to fit nicely (including starts_with and ends_with).

There may be a possibility of generalizing the Pattern trait to work for OS strings as well (cc @Kimundi, the author of the trait), but we can also start out conservatively and assume that a "pattern" is "something that can be an OsStr" to start off with.

Kimundi · 2015-07-01T14:05:21Z

Hm, right now I see two possible ways to generalize the pattern API to Osstrings:

Make the slice type a generic paramter of the trait, and add seperate impls for Pattern<'a, str> and Pattern<'a OsStr> (and maybe Pattern<'a, [T]> as well). That should be relatively easy to do, but makes the API of the actual methods more complicated to read (where P: Pattern<'a> -> where P: Pattern<'a, str>). That said, it doesn't look that terrible either. Would be nice if we could do type Pattern<'a> = GeneralPattern<'a, str> I guess.
Change the Pattern API to work with any kind somehow searchable slice, and have wrapper around that that exposes it for str, OsStr and maybe plain [T]. Might reduce code duplication somewhat, but might also just be a more complicated variations of option 1. Also, its unclear if we could have specialized search algorithms as easily with this.

bluss · 2015-07-01T14:23:20Z

The current algorithm for &str in &str search supports any "ordered alphabet" so I assume it can be adapted to any [T] where T: Ord or at least any integer types. I think we want to make sure we can still specialize the [u8] case since we want to have all possibilities open to further optimizations (memchr, memcmp and even pcmpestri).

alexcrichton · 2015-07-01T16:22:44Z

Yeah I'm not sure if it's actually possible to use the same Pattern trait (or if we even want to), but it's certainly a possibility to explore!

aturon · 2015-07-01T23:34:33Z

I want to call attention to a point that @Kimundi made in passing: a generalization of the Pattern API should, ideally, apply to [T] as well. (That would in particular bring Vec<u8> largely up to parity with String as a first-class string type.)

Add OsStr starts_with and ends_with.

cbc965e

rust-highfive assigned nikomatsakis Jun 22, 2015

rust-highfive assigned alexcrichton and unassigned nikomatsakis Jun 29, 2015

alexcrichton added I-needs-decision Issue: In need of a decision. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 29, 2015

Stebalien force-pushed the os_str branch from 500445d to cbc965e Compare June 29, 2015 18:14

bluss reviewed Jun 29, 2015
View reviewed changes

Stebalien closed this Jun 30, 2015

Stebalien mentioned this pull request Oct 6, 2015

OS string string-like interface rust-lang/rfcs#1309

Closed

codyps mentioned this pull request Mar 6, 2017

OsStr: compare prefix #40300

Closed

dtolnay mentioned this pull request Nov 17, 2017

Bringing OsStr and CStr up to par with str rust-lang/rfcs#900

Open

jmillikin mentioned this pull request Sep 28, 2022

ACP: Method to split OsStr into (str, OsStr) rust-lang/libs-team#114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add starts_with and ends_with to OsStr #26499

Add starts_with and ends_with to OsStr #26499

Stebalien commented Jun 22, 2015

rust-highfive commented Jun 22, 2015

nikomatsakis commented Jun 23, 2015

nikomatsakis commented Jun 29, 2015

alexcrichton commented Jun 29, 2015

bluss Jun 29, 2015

Stebalien Jun 29, 2015

bluss Jun 29, 2015

SimonSapin Jun 29, 2015

kennytm Nov 25, 2017

Stebalien commented Jun 29, 2015

bluss commented Jun 30, 2015

Stebalien commented Jun 30, 2015

Stebalien commented Jun 30, 2015

alexcrichton commented Jul 1, 2015

Kimundi commented Jul 1, 2015

bluss commented Jul 1, 2015

alexcrichton commented Jul 1, 2015

aturon commented Jul 1, 2015

Add starts_with and ends_with to OsStr #26499

Add starts_with and ends_with to OsStr #26499

Conversation

Stebalien commented Jun 22, 2015

rust-highfive commented Jun 22, 2015

nikomatsakis commented Jun 23, 2015

nikomatsakis commented Jun 29, 2015

alexcrichton commented Jun 29, 2015

bluss Jun 29, 2015

Choose a reason for hiding this comment

Stebalien Jun 29, 2015

Choose a reason for hiding this comment

bluss Jun 29, 2015

Choose a reason for hiding this comment

SimonSapin Jun 29, 2015

Choose a reason for hiding this comment

kennytm Nov 25, 2017

Choose a reason for hiding this comment

Stebalien commented Jun 29, 2015

bluss commented Jun 30, 2015

Stebalien commented Jun 30, 2015

Stebalien commented Jun 30, 2015

alexcrichton commented Jul 1, 2015

Kimundi commented Jul 1, 2015

bluss commented Jul 1, 2015

alexcrichton commented Jul 1, 2015

aturon commented Jul 1, 2015