Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for str escaping #27791

Closed
alexcrichton opened this issue Aug 13, 2015 · 28 comments
Closed

Tracking issue for str escaping #27791

alexcrichton opened this issue Aug 13, 2015 · 28 comments
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Aug 13, 2015

This is a tracking issue for the unstable str_escape feature in the standard library. The open questions I know of here are:

  • Do we want to support these relatively arbitrary ways to escape strings? Note that these same methods for escaping characters are already stable.
  • Are the precise forms of escaping stable? (probably because they delegate to char implementations)

I suspect that because the implementations on char are stable that we'll also stabilize these.

Signatures:

String::escape_debug(&self) -> String;
String::escape_default(&self) -> String;
String::escape_unicode(&self) -> String;
@alexcrichton alexcrichton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. labels Aug 13, 2015
@AnthonyBroadCrawford
Copy link

@alexcrichton Mind if I take this issue?

@alexcrichton
Copy link
Member Author

@AnthonyBroadCrawford certainly! The workflow for this would probably look like:

  • Come up with a story for these methods, e.g. should they be stabilized as-is or tweaked?
  • Gain feedback about that state of affairs (e.g. IRC, reddit, forums, etc).
  • If large enough, write an RFC
  • Write a PR
  • Stabilize in the next cycle.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 22, 2015

The reason given for why they're currently unstable says that they might be rewritten to return a char iterator instead?

@bluss
Copy link
Member

bluss commented Sep 23, 2015

Maybe we can consider the idea of them returning iterators, but those iterators implementing Display. That makes it easy to use them composably (without having to allocate a string) while also having the string available (the .to_string() method).

@Diggsey
Copy link
Contributor

Diggsey commented Sep 23, 2015

@bluss You can already do .collect() on a char iterator to turn it into a string.

@bluss
Copy link
Member

bluss commented Sep 23, 2015

True. With Display you can {}-format the escaped string without intermediate String or write-each-char loop.

@Manishearth Manishearth removed the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Oct 2, 2015
@alexcrichton alexcrichton added I-nominated final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed I-nominated labels Dec 16, 2015
@alexcrichton
Copy link
Member Author

🔔 This issue is now entering its final comment period for stabilization 🔔

For now this FCP will be for returning a String as returning an Iterator which implements Display isn't found much elsewhere in the standard library, but FCP can help duke this out!

@BurntSushi
Copy link
Member

Given that we have char::escape_{default,unicode} already stabilized, these seem like natural things to stabilize as well. I also agree with returning String, because that's probably what you'll use most of the time. If the extra allocation really matters, then expecting the programmer to build the iterator themselves with the underlying methods on char seems pretty reasonable IMO.

I do admit that I find it strange that these methods are in std, but I have found them occasionally very useful.

@bluss
Copy link
Member

bluss commented Jan 13, 2016

As previous, I favour using Display-based interfaces for composability.

For example write!(w, "{}:{}", s.escape_default(), t.escape_default()) does then not need to allocate intermediate strings, and it can be part of a much larger function, where the output and allocation is ultimately determined by the caller.

@aturon
Copy link
Member

aturon commented Jan 13, 2016

@BurntSushi The problem with going through char iterators is extra utf-8 en/decoding.

@BurntSushi
Copy link
Member

@aturon Good point. If others feel good about returning an iterator, then I'm happy with that!

@alexcrichton
Copy link
Member Author

The libs team discussed this in triage recently and the decision was to punt on this for now.

It was brought up that with the exact same name as char::escape_default it is reasonable to expect that the two methods work the same way, so returning a String may be surprising vs returning an iterator with char::escape_default. It was not clear that we had consensus for returning a structure that implements Display + Iterator, but the options we were considering were:

// signature
fn escape_default(&self) -> EscapeDefault;
  1. EscapeDefault just implements Display
  2. EscapeDefault just implements Iterator
  3. EscapeDefault implements both Display + Iterator

As pointed out, just implementing Iterator may not be the most performant because you have to decode and then re-encode, whereas a direct Display implementation would lend itself to perhaps less work.

It's also a possibility that we could return one trait implementation and then add the other in a backwards-compatible fashion in the future.

While we decide this, however, this is being removed from FCP.

@alexcrichton alexcrichton removed the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Jan 15, 2016
@clarfonthey
Copy link
Contributor

clarfonthey commented Jan 10, 2017

Just wanted to note that if it's decided that these methods are best implemented as Iterators or Displayable types, that to_lowercase and to_uppercase should be planned to eventually get the same treatment. Right now, they're implemented as Strings, which isn't the best IMO especially when in many cases it'd be more efficient to just use a Display implementation.

(obviously that'd be a breaking change, but it's something definitely worth considering)

@SimonSapin
Copy link
Contributor

@clarcharr to_lowercase and to_uppercase exist both as methods of &str where they’re convenience methods that return String, and of char where they return iterators that doesn’t allocate. The theory is that the former are easy to build on top of the latter (e.g. if you need something other than String) but there is one exception: str::to_lowercase has additional code to handle the case of Σ in "end of word" positions (which maps to ς instead of σ). So… maybe there should still be an iterator that does this.

(obviously that'd be a breaking change, but it's something definitely worth considering)

Probably we’d deprecate the old methods but not remove them.

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 22, 2017
@SimonSapin
Copy link
Contributor

SimonSapin commented Mar 28, 2018

This feature is about three inherent methods of the str type: escape_default, escape_unicode, escape_debug. Three corresponding inherent methods of char are already stable.

The libs team discussed this and the consensus was to stabilize, after changing the return type (from String) to dedicated types that implement both Iterator<Item=char> and Display, similar to char methods. If Display is used after Iterator::next has been called once or more, only the "remaining" chars are displayed. (Again like with char methods.)

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Mar 28, 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Mar 28, 2018
@rfcbot
Copy link

rfcbot commented Mar 28, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Mar 28, 2018
@SimonSapin
Copy link
Contributor

The agreed API is not just an iterator, but also something that implements Display. Is -> impl Iterator<Item=char> + Display supported?

Given the noted disadvantages I’m inclined to go with "traditional" dedicated types here. In particular, not being able to name the return type seems significant.

@est31
Copy link
Member

est31 commented May 31, 2018

Is -> impl Iterator<Item=char> + Display supported?

Yes.

Note that @mominul had been reaching out for help on #rust-internals, this is what spawned my comment above. You may want to read the scrollback for context. Someone suggested use of impl Trait, then @rkruppe pointed out that this wasn't what the libs team agreed on.

@clarfonthey
Copy link
Contributor

Personally, I think that it would be best to make concrete types for now, like the rest of the standard library. At some point, though, it would be nice to be able to soft-deprecate all of the concrete iterator types and replace them with impl Iterator in the docs.

@BurntSushi
Copy link
Member

and replace them with impl Iterator in the docs.

For many of the adapters, that will actually need to be quite longer. For iter::Map for example, I think it would need to be, impl Iterator<Item=B> + DoubleEndedIterator + Debug + ExactSizeIterator + FusedIterator + TrustedLen + Clone + Send + Sync.

@clarfonthey
Copy link
Contributor

Err, you're right. It seems that a few trait aliases might come in handy for that.

Not sure about Send and Sync though.

@hanna-kruppe
Copy link
Contributor

For most iterator adapters (including Map) that doesn't even work because many impls are conditional on what traits the wrapped iterator implements -- DoubleEndedIterator, ExactSizeIterator, Clone, etc.

I also suspect there's problems with variance but that's probably less of a problem in practice.

SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 12, 2019
SimonSapin added a commit to SimonSapin/rust that referenced this issue Feb 12, 2019
bors added a commit that referenced this issue Feb 12, 2019
Stabilize str::escape_* methods with new return types…

… that implement `Display` and `Iterator<Item=char>`, as proposed in FCP: #27791 (comment)
@SimonSapin
Copy link
Contributor

Stabilized in #58051.

spikespaz pushed a commit to spikespaz/dotwalk-rs that referenced this issue Aug 29, 2024
spikespaz pushed a commit to spikespaz/dotwalk-rs that referenced this issue Aug 29, 2024
spikespaz pushed a commit to spikespaz/dotwalk-rs that referenced this issue Aug 29, 2024
Stabilize str::escape_* methods with new return types…

… that implement `Display` and `Iterator<Item=char>`, as proposed in FCP: rust-lang/rust#27791 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests