Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved docs for CStr, CString, OsStr, OsString #44855

Merged
merged 15 commits into from
Oct 13, 2017
Merged

Improved docs for CStr, CString, OsStr, OsString #44855

merged 15 commits into from
Oct 13, 2017

Conversation

federicomenaquintero
Copy link
Contributor

This expands the documentation for those structs and their corresponding traits, per #29354

We describe the representation of C strings, and the purpose of
OsString/OsStr.

Part of #29354
Explain the struct's reason for being, and its most common usage
patterns.  Add a bunch of links.

Clarify the method docs a bit.

Part of #29354
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@cuviper
Copy link
Member

cuviper commented Sep 26, 2017

CStr and CString are not necessarily UTF-8 at all! If they were, then CStr::to_str() and CString::into_string() would be infallible conversion, not needing a Result.

@federicomenaquintero
Copy link
Contributor Author

Oops, long lines... will fix.

I'll also clarify that CStr/CString are bags of zero-terminated bytes, and UTF-8 only happens when making a string out of them.

@arielb1 arielb1 added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 26, 2017
/// This type serves the primary purpose of being able to safely generate a
/// C-compatible string from a Rust byte slice or vector. An instance of this
/// This type serves the purpose of being able to safely generate a
/// C-compatible UTF-8 string from a Rust byte slice or vector. An instance of this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't guaranteed by CStr.

@@ -8,7 +8,145 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! Utilities related to FFI bindings.
//! This module provides utilities to handle C-like strings. It is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a bit misleading because OsString isn't a C string on Windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way to describe it might be "to handle data across non-Rust interfaces, like other programming languages and the underlying operating system"

//! borrowed slices of strings with the [`str`] primitive. Both are
//! always in UTF-8 encoding, and may contain nul bytes in the middle,
//! i.e. if you look at the bytes that make up the string, there may
//! be a `0` among them. Both `String` and `str` know their length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the '0' here makes it look like you're referring to a zero digit, not a literal zero. Perhaps use '\0' instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit: I'd word "know their length" as "store their length explicitly" because technically we "know" the length of a C-string, but it's not computed in O(1) time.

//!
//! C strings are different from Rust strings:
//!
//! * **Encodings** - C strings may have different encodings. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "encoding" here is a bit inaccessible to people who are unfamiliar with how string encoding works. I'd say introduce it with "Rust strings are UTF-8, but C strings may use other encodings. If you're using a string from C, you may have to check its encoding explicitly, rather than just assuming that it's UTF-8 like you can in Rust."

//! you are bringing in strings from C APIs, you should check what
//! encoding you are getting. Rust strings are always UTF-8.
//!
//! * **Character width** - C strings may use "normal" or "wide"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Width" here may be what C uses, but it's again misleading because Unicode has its own specific definition of width. I'd say "size" instead. Instead of using "normal" and "wide," I'd just say directly that C uses two types, char (clarifying that this is different from Rust's type) and wchar_t, which are different sizes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also clarify that wchar_t is referred to by "wide character" but that this doesn't actually reflect the Unicode width, but the size of the character in bytes.

//! '[Unicode code point]'.
//!
//! * **Nul terminators and implicit string lengths** - Often, C
//! strings are nul-terminated, i.e. they have a `0` character at the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd use '\0' here instead of '0'.

//!
//! * **Nul terminators and implicit string lengths** - Often, C
//! strings are nul-terminated, i.e. they have a `0` character at the
//! end. The length of a string buffer is not known *a priori*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use Latin; just say that it isn't stored, but has to be calculated. IMHO we should keep language simple if possible to be more accessible to non-native speakers.

//! `wcslen()` for `wchar_t`-based ones. Those functions return the
//! number of characters in the string excluding the nul terminator,
//! so the buffer length is really `len+1` characters. Rust strings
//! don't have a nul terminator, and they always know their length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also note in here somewhere that Rust's way of doing it means that you can easily access a string's length, whereas there's an implicit cost to it in C. This also may carry over to CStr if its implementation changes.

//! so the buffer length is really `len+1` characters. Rust strings
//! don't have a nul terminator, and they always know their length.
//!
//! * **No nul characters in the middle of the string** - When C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd word this as "Internal NULs" as a more succinct version

//! strings have a nul terminator character, this usually means that
//! they cannot have nul characters in the middle — a nul character
//! would essentially truncate the string. Rust strings *can* have
//! nul characters in the middle, since they don't use nul
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "don't use nul terminators," it's clearer to say "because NUL doesn't have to mark the end of the string in Rust"

//! # Representations of non-Rust strings
//!
//! [`CString`] and [`CStr`] are useful when you need to transfer
//! UTF-8 strings to and from C, respectively:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expand this to languages with a C ABI like Python, etc. People should know that a CStr might be necessary when interacting with other languages too.

//! UTF-8 strings to and from C, respectively:
//!
//! * **From Rust to C:** [`CString`] represents an owned, C-friendly
//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always valid UTF-8.

//!
//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
//! is what you would use to wrap a raw `*const u8` that you got from
//! a C function. A `CStr` is just guaranteed to be a nul-terminated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"just" seems out of place here; I'd remove it.

//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
//! is what you would use to wrap a raw `*const u8` that you got from
//! a C function. A `CStr` is just guaranteed to be a nul-terminated
//! array of bytes; the UTF-8 validation step only happens when you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the UTF-8 validation step" is only just mentioned here so I'd just make a separate sentence describing how that works instead, along the lines of "once you have a CStr, you can convert it to a Rust str if it's valid UTF-8, or lossily convert it by adding replacement characters"

//! request to convert it to a `&str`.
//!
//! [`OsString`] and [`OsStr`] are useful when you need to transfer
//! strings to and from operating system calls. If you need Rust
Copy link
Contributor

@clarfonthey clarfonthey Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of programmers may not know what system calls are; I'd probably word this as "the operating system itself."

It may also make sense to include examples where this happens, like in opening files and running external commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the "If you need Rust strings out of them [...]" section is kind of redundant and wordy. I'd probably just say that conversions between OsStr and str work very similarly to CStr and leave it at that.

@clarfonthey
Copy link
Contributor

Great work! I've interacted a lot with CStr and OsStr so I added some comments on ways that I think the docs could be made clearer. Hopefully it's more helpful than overwhelming ><

@federicomenaquintero
Copy link
Contributor Author

I've integrated the changes per your comments. How's it look now? :)

@clarfonthey
Copy link
Contributor

Looks good to me! Again, great work! :)

@federicomenaquintero
Copy link
Contributor Author

Thank you!

@shepmaster shepmaster added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 6, 2017
@shepmaster
Copy link
Member

Poke @aturon — this is now ready for your masterful reviewing skills!

@carols10cents
Copy link
Member

Actually, @aturon wasn't available last week and is on PTO this week, so let's try....

r? @steveklabnik

@rust-highfive rust-highfive assigned steveklabnik and unassigned aturon Oct 9, 2017
Copy link
Member

@steveklabnik steveklabnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thank you so much!

I have a few little formatting nits, but after that, let's get this merged!

@@ -149,8 +209,13 @@ pub struct CStr {
}

/// An error returned from [`CString::new`] to indicate that a nul byte was found
/// in the vector provided.
/// in the vector provided. While Rust strings may contain nul bytes in the middle,
/// C strings can't, as that byte would effectively truncate the string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change this up a bit? We try to have a summary sentence first, then the rest of it. This one has a long summary, and repeats itself since you added the information below. How about:

/// An error indicating that an interior nul byte was found.
///
/// While Rust strings may contain nul bytes in the middle, C strings can't, as that byte would effectively
/// truncate the string.
///
/// This `struct`....

with the correct wrapping, I just guessed here. What do you think?

/// that a nul byte was found too early in the slice provided, or one
/// wasn't found at all for the nul terminator. The slice used to
/// create a `CStr` must have one and only one nul byte at the end of
/// the slice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here; don't repeat where it came from, make sure to have a short summary, some space, and then a longer description.

/// UTF-8 error was encountered during the conversion. `CString` is
/// just a wrapper over a buffer of bytes with a nul terminator;
/// [`into_string`][`CString::into_string`] performs UTF-8 validation
/// and may return this error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

/// underlying bytes to construct a new string, ensuring that
/// there is a trailing 0 byte. This trailing 0 byte will be
/// appended by this method; the provided data should *not*
/// contain any 0 bytes in it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a method; could you say "function" instead?

@@ -8,7 +8,156 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! Utilities related to FFI bindings.
//! This module provides utilities to handle data across non-Rust
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this short summary, but with a newline between it, so you get the summary. That is:

///! Utilities related to FFI bindings.
//!
//! This module provides utilities....

//! C strings are different from Rust strings:
//!
//! * **Encodings** - Rust strings are UTF-8, but C strings may use
//! other encodings. If you are using a string from C, you should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one space after a period, not two please!

//! characters; please **note** that C's `char` is different from Rust's.
//! The C standard leaves the actual sizes of those types open to
//! interpretation, but defines different APIs for strings made up of
//! each character type. Rust strings are always UTF-8, so different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here, and everywhere 😄

@steveklabnik
Copy link
Member

Thanks! @bors: r+ rollup

@bors
Copy link
Contributor

bors commented Oct 12, 2017

📌 Commit 5fb8e3d has been approved by steveklabnik

kennytm added a commit to kennytm/rust that referenced this pull request Oct 13, 2017
…eklabnik

Improved docs for CStr, CString, OsStr, OsString

This expands the documentation for those structs and their corresponding traits, per rust-lang#29354
bors added a commit that referenced this pull request Oct 13, 2017
Rollup of 14 pull requests

- Successful merges: #44855, #45110, #45122, #45133, #45173, #45178, #45189, #45203, #45209, #45221, #45236, #45240, #45245, #45253
- Failed merges:
@bors bors merged commit 5fb8e3d into rust-lang:master Oct 13, 2017
GuillaumeGomez added a commit to GuillaumeGomez/this-week-in-rust-docs that referenced this pull request Oct 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants