Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: conventions for placement of unsafe APIs #240

Merged
merged 4 commits into from
Oct 7, 2014

Conversation

aturon
Copy link
Member

@aturon aturon commented Sep 15, 2014

This is a conventions RFC for settling the location of unsafe APIs relative
to the types they work with, as well as the use of raw submodules.

The brief summary is:

  • Unsafe APIs should be made into methods or static functions in the same cases
    that safe APIs would be.
  • raw submodules should be used only to provide APIs directly on low-level
    representations.

Rendered

@aturon
Copy link
Member Author

aturon commented Sep 15, 2014

string. This method makes it easy to work with the byte-based representation
of the string, but thereby also allows violation of the utf8 guarantee.

* A `raw` submodule with a number of free functions, like `from_parts`, that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API.

The motivation for making unsafe code uglier in these cases is not laid out. The point of unsafe is to draw a boundary between safe and unsafe code without the need for conventions to distinguish between these. The rule about construction from raw pointer representation is very obscure, and doesn't change the fact that it makes the API less consistent and more painful to use without gaining anything in return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestinger

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API. The motivation for making unsafe code uglier in these cases is not laid out.

I'm a little confused by this comment. The RFC recommends making unsafe functions into methods or static functions in the same cases you'd do so for safe APIs. It cites as an explicit benefit that importing/using these APIs becomes easier by doing so:

The benefit to moving unsafe APIs into methods (resp. static functions) is the usual one: you can gain easy access to these APIs merely by having a value of the type (resp. importing the type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep asserting that this makes the API harder to use, but that's extremely subjective. I've always found the raw modules in str/slice/string/vec make the API much easier to use, because any time I need to construct a value from raw pointers, I know precisely where to look.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aturon: Sorry, I misinterpreted the proposed guidelines. The reason I didn't find the details about how to handle those edge cases and a detailed rationale is because it's not what you're proposing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think much of the "know where to look" problem would be fixed by segmenting the things in the source, and also allowing rustdoc to divide functions/methods into user-defined subsections. This would mean it is clear both in the .rs file and in the docs.

@lilyball
Copy link
Contributor

This description of raw doesn't really match what we're doing today. The 4 modules std::slice::raw, std::str::raw, std::string::raw, and std::vec::raw all operate on raw pointers. More specifically, they construct the given type from raw pointers. The only exception are two functions in std::str::raw that derive a &str from another &str without doing utf-8 checks.

Arguably, "raw pointers" could be considered "low-level representation", but I'm not comfortable making that assumption, because only some of the functions actually treat the raw pointer as the low-level representation. Other functions, such as std::string::raw::from_buf_len(), copy data out of the raw pointer instead of using it directly.

There's also a couple of functions that don't even use raw pointers or low-level representations. I'm thinking here of std::str::raw::slice_bytes() and std::str::raw::slice_unchecked(). They both construct a &str based on another &str without checking if the resulting slice maintains the utf-8 invariant (the _unchecked variant also skips the bounds test).

Given these last two functions, I think the more general description of raw is functions that construct a value of the type without checking for one or more invariants. This covers those functions, it covers all the functions that construct values by copying data from raw pointers, and it covers constructing a value out of its private low-level representation. So I think we should document the rule based on invariants, and then explain that this generally includes constructing values from raw pointers and low-level representations.

* Use `raw` submodules to group together *all* manipulation of low-level
representations. No module in `std` currently does this; existing modules
provide some free functions in `raw`, and some unsafe methods, without a clear
driving principle. The ergonomics of moving *everything* into free functions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ergonomics of moving anything into free functions is poor. Downgrading more methods is worse, but doing it to a few is still more verbose and harder to navigate - especially due to re-exports. It also means introducing type prefixes in modules exposing more than one type.

@aturon
Copy link
Member Author

aturon commented Sep 15, 2014

@kballard Thanks for the clarifications. I will try to incorporate them into the RFC to more accurately convey the current state of affairs.

That said, I'm curious what you think about the proposed conventions?

One thing I'm not yet happy with is the recommendation for when raw submodules remains appropriate. There are certain cases, like sync::raw, where these submodules actually define some "lower-level" types, and the intent of the current RFC text is to capture such cases. But I don't think the guideline is terribly clear, yet. If you have any suggestions there, I'd love to hear them.

underlying representation of a data structure (which is otherwise private),
the API should use `raw` in its name. Specifically, `from_raw_parts` is the
typical name used for constructing a value from e.g. a pointer-based
representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. Vec::from_raw_parts seems to be covered under the convention for raw submodules, as it is constructing a Vec from the low-level representation (pointer, length, capacity). If it's put in the raw submodule then it shouldn't have raw in the name, because that's unnecessarily verbose (raw::from_raw_parts?),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard It seems that the RFC text is unclear. I meant for from_raw_parts to be a static function (according to the first bullet). The second bullet was just a clarification about the name.

I'm going to revise the RFC text and push; will ping when done.

@lilyball
Copy link
Contributor

@aturon

One thing I'm not yet happy with is the recommendation for when raw submodules remains appropriate.

Based on the raw modules I cited before, I'm currently leaning towards saying raw modules are for functions that construct a value in a manner that may break the invariants normally maintained by the type (as opposed to breaking invariants on parameters to the function, which is covered by the _unchecked suffix). The easiest way to do this is of course to construct a value directly out of its low-level representation. But another way to do this is e.g. to copy data out of a raw pointer without checking the format of the data, which is what the std::string::raw functions generally do.

With this rule, the usage because pretty simple. Do you want to construct a value and bypass invariant checks? Look in the raw module. Otherwise, if you're maintaining invariants, it's a static function on the type, and if you're manipulating an existing value, it's methods on the type.


# Unresolved questions

The `core::raw` module provides structs with public representations equivalent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box, Closure and Procedure will end up being removed since they're becoming obsolete. That will only leave behind Slice (could just be in the slice module) and TraitObject so it probably does need to be replaced.

@lilyball
Copy link
Contributor

@aturon I'd also like to see this RFC recommend that unsafe methods should all be placed into the same impl block (or blocks, if some methods require different bounds), and that these impl blocks should be placed at the end of the file. This will organize them better in the rustdoc output without requiring any rustdoc changes at all, as rustdoc maintains the impl grouping and order. We can still extend rustdoc later to have a visibility toggle for unsafe methods, but I think it still makes sense to put the unsafe methods at the end of the documentation.

@aturon
Copy link
Member Author

aturon commented Sep 15, 2014

@kballard

I'd also like to see this RFC recommend that unsafe methods should all be placed into the same impl block (or blocks, if some methods require different bounds), and that these impl blocks should be placed at the end of the file. This will organize them better in the rustdoc output without requiring any rustdoc changes at all, as rustdoc maintains the impl grouping and order. We can still extend rustdoc later to have a visibility toggle for unsafe methods, but I think it still makes sense to put the unsafe methods at the end of the documentation.

That's a great idea!

@Gankra
Copy link
Contributor

Gankra commented Sep 15, 2014

@kballard I like the suggestion of making unsafe stuff internally separate and at the end, as this matches the current convention of seperating private/public methods into seperate impls as well. I'd rather just see an additional subheading or maybe a red coloured "unsafe operations" box, than any kind of hide/show unsafe functionality though.

@thestinger
Copy link

@kballard: Your definition of raw modules means that functions like set_len and as_ptr would need to become methods. There are other ways to break the invariants of the type with unsafe code anyway. The methods are there for convenience, and making them more verbose / inconvenient to use just means more manual usage of functions like transmute_copy.

@lilyball
Copy link
Contributor

@thestinger I assume you mean methods would need to become functions. And no they wouldn't. Neither of those methods are value constructors. I even explicitly stated that methods that modify existing values (such as set_len) would stay as methods.

@thestinger
Copy link

@kballard: There's no functional difference between the from_raw_parts convenience method and assigning the pointer / length / capacity to a vector constructed with new. Both are valid ways of constructing a vector from raw parts, and I don't see a distinction between them that merits making one way more verbose and inconsistent with the rest of the API. That's why what you described also applies to methods like set_len - those methods expose a superset of the functionality, with the same internal details leaked and the same memory safety issues.

@aturon
Copy link
Member Author

aturon commented Sep 15, 2014

@kballard @thestinger I've pushed a significant revision, which I hope clarifies what's being proposed here.

@kballard Specifically regarding naming conventions, I don't think that requiring a raw prefix or suffix everywhere we currently use a raw submodule makes sense; I'd prefer to reserve "raw" for a "lower-level" representation. I tried to give some general naming guidelines, nevertheless, but I think we'll also just have to use best case-by-case judgment during stabilization to finalize these names.

@jfager
Copy link

jfager commented Sep 15, 2014

Just spitballing: if the convention is going to be that unsafe methods should be in their own impl block, would there be value in creating a new unsafe impl that only allows unsafe methods and forbidding unsafe methods in a regular impl? That is, have the language actually enforce the convention?

@lilyball
Copy link
Contributor

@jfager What benefit would that provide? Language rules != conventions.


# Alternatives

There are a few alternatives:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One unlisted alternative is the design I gave in another issue (I forget where) that uses associated items (RFC PR #195) to provide e.g. String::raw::from_parts(). This solves the ergonomics of needing to import something to get access to the values. This would look like

impl String {
    /// Raw operations
    static raw: StringRaw = StringRaw;
}

struct StringRaw;

impl StringRaw {
    unsafe fn from_parts(buf: *mut u8, length: uint, capacity: uint) -> String;
    // ...
}

Personally, I think the ergonomics of this are nice. It's basically providing a tiny module structure for the static methods of a type, similar to how we use a module structure in general for types/functions. The biggest downside from a usability perspective is that it's not immediately obvious in rustdoc how this is supposed to be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard I'm sorry I left this out -- I'll add a reference to it from the RFC text.

@jfager
Copy link

jfager commented Sep 16, 2014

@kballard Of course convention != language rule; you weren't able to infer that the question was whether the convention should be promoted to a rule? If not a full-blown rule, then perhaps a lint detecting safe and unsafe methods in the same impl block.

@lilyball
Copy link
Contributor

@jfager Of course I was, you don't need to be condescending. I'm telling you that there's no need to put hard rules into the language in order to enforce a convention. I don't believe this even qualifies for a stylistic lint. This is a recommendation that improves documentation, not a universal style. More importantly, while our standard libraries are going under heavy API review, most third-party code won't have the same level of API review and won't benefit from having compiler-enforced rules around the structure of their impl blocks.

@jfager
Copy link

jfager commented Sep 16, 2014

@kballard Hey, you're the one who felt the need to inform me that a language rule and a convention aren't the same thing. Anyways, that's a much better answer to the original question, thank you. It wasn't clear that this was intended primarily for the standard lib and not as a declaration of Idiomatic Rust Code that people should feel bad for ignoring in their own code.

@huonw
Copy link
Member

huonw commented Sep 16, 2014

Can we stop focusing on documentation? We can equally improve rustdoc to improve how the documentation of methods are laid out to get those benefits without raw modules. :)

@brson
Copy link
Contributor

brson commented Sep 16, 2014

@aturon I agree with these conventions in principle, though I am very curious about how this specifically affects the various raw modules currently in existence, e.g. I would expect this to make them less prevalent.

@aturon
Copy link
Member Author

aturon commented Sep 16, 2014

@brson

I agree with these conventions in principle, though I am very curious about how this specifically affects the various raw modules currently in existence, e.g. I would expect this to make them less prevalent.

That's right. Concretely, I think we'd be left with only sync::raw and core::raw. The key justification for those modules is that they define raw representation types.

@Ericson2314
Copy link
Contributor

I was thinking that a unsafe impl could just be shorthand for making everything within unsafe, and a normal impl would work the same as today. Not sure if that adds enough convenience to be worth it, but just thought I'd throw it out there.

@Kimundi
Copy link
Member

Kimundi commented Sep 20, 2014

+1 For this proposal, unsafe methods/functions can already be grouped together by being unsafe and naming conventions, no point in making them more annoying to use with free functions in a raw module.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Sep 22, 2014
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
bors added a commit to rust-lang/rust that referenced this pull request Sep 24, 2014
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
@alexcrichton alexcrichton merged commit 2e00888 into rust-lang:master Oct 7, 2014
@alexcrichton
Copy link
Member

@aturon aturon mentioned this pull request Oct 9, 2014
@Centril Centril added A-unsafe Unsafe related proposals & ideas A-convention Proposals relating to documentation conventions. labels Nov 23, 2018
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
spikespaz pushed a commit to spikespaz/dotwalk-rs that referenced this pull request Aug 29, 2024
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-convention Proposals relating to documentation conventions. A-unsafe Unsafe related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.