Improved docs for CStr, CString, OsStr, OsString #44855

federicomenaquintero · 2017-09-26T02:20:04Z

This expands the documentation for those structs and their corresponding traits, per #29354

We describe the representation of C strings, and the purpose of OsString/OsStr. Part of #29354

Explain the struct's reason for being, and its most common usage patterns. Add a bunch of links. Clarify the method docs a bit. Part of #29354

…ke in iterators

rust-highfive · 2017-09-26T02:20:11Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

cuviper · 2017-09-26T04:19:00Z

CStr and CString are not necessarily UTF-8 at all! If they were, then CStr::to_str() and CString::into_string() would be infallible conversion, not needing a Result.

federicomenaquintero · 2017-09-26T13:59:14Z

Oops, long lines... will fix.

I'll also clarify that CStr/CString are bags of zero-terminated bytes, and UTF-8 only happens when making a string out of them.

clarfonthey · 2017-09-27T05:05:34Z

src/libstd/ffi/c_str.rs

-/// This type serves the primary purpose of being able to safely generate a
-/// C-compatible string from a Rust byte slice or vector. An instance of this
+/// This type serves the purpose of being able to safely generate a
+/// C-compatible UTF-8 string from a Rust byte slice or vector. An instance of this


This isn't guaranteed by CStr.

clarfonthey · 2017-09-27T05:08:02Z

src/libstd/ffi/mod.rs

@@ -8,7 +8,145 @@
 // option. This file may not be copied, modified, or distributed
 // except according to those terms.

-//! Utilities related to FFI bindings.
+//! This module provides utilities to handle C-like strings.  It is


I think that this is a bit misleading because OsString isn't a C string on Windows.

A better way to describe it might be "to handle data across non-Rust interfaces, like other programming languages and the underlying operating system"

clarfonthey · 2017-09-27T05:08:43Z

src/libstd/ffi/mod.rs

+//! borrowed slices of strings with the [`str`] primitive.  Both are
+//! always in UTF-8 encoding, and may contain nul bytes in the middle,
+//! i.e. if you look at the bytes that make up the string, there may
+//! be a `0` among them.  Both `String` and `str` know their length;


nit: the '0' here makes it look like you're referring to a zero digit, not a literal zero. Perhaps use '\0' instead?

another nit: I'd word "know their length" as "store their length explicitly" because technically we "know" the length of a C-string, but it's not computed in O(1) time.

clarfonthey · 2017-09-27T05:10:34Z

src/libstd/ffi/mod.rs

+//!
+//! C strings are different from Rust strings:
+//!
+//! * **Encodings** - C strings may have different encodings.  If


I think that "encoding" here is a bit inaccessible to people who are unfamiliar with how string encoding works. I'd say introduce it with "Rust strings are UTF-8, but C strings may use other encodings. If you're using a string from C, you may have to check its encoding explicitly, rather than just assuming that it's UTF-8 like you can in Rust."

clarfonthey · 2017-09-27T05:11:54Z

src/libstd/ffi/mod.rs

+//! you are bringing in strings from C APIs, you should check what
+//! encoding you are getting.  Rust strings are always UTF-8.
+//!
+//! * **Character width** - C strings may use "normal" or "wide"


"Width" here may be what C uses, but it's again misleading because Unicode has its own specific definition of width. I'd say "size" instead. Instead of using "normal" and "wide," I'd just say directly that C uses two types, char (clarifying that this is different from Rust's type) and wchar_t, which are different sizes.

You can also clarify that wchar_t is referred to by "wide character" but that this doesn't actually reflect the Unicode width, but the size of the character in bytes.

clarfonthey · 2017-09-27T05:13:00Z

src/libstd/ffi/mod.rs

+//! '[Unicode code point]'.
+//!
+//! * **Nul terminators and implicit string lengths** - Often, C
+//! strings are nul-terminated, i.e. they have a `0` character at the


Again, I'd use '\0' here instead of '0'.

clarfonthey · 2017-09-27T05:13:45Z

src/libstd/ffi/mod.rs

+//!
+//! * **Nul terminators and implicit string lengths** - Often, C
+//! strings are nul-terminated, i.e. they have a `0` character at the
+//! end.  The length of a string buffer is not known *a priori*;


No need to use Latin; just say that it isn't stored, but has to be calculated. IMHO we should keep language simple if possible to be more accessible to non-native speakers.

clarfonthey · 2017-09-27T05:14:35Z

src/libstd/ffi/mod.rs

+//! `wcslen()` for `wchar_t`-based ones.  Those functions return the
+//! number of characters in the string excluding the nul terminator,
+//! so the buffer length is really `len+1` characters.  Rust strings
+//! don't have a nul terminator, and they always know their length.


I'd also note in here somewhere that Rust's way of doing it means that you can easily access a string's length, whereas there's an implicit cost to it in C. This also may carry over to CStr if its implementation changes.

clarfonthey · 2017-09-27T05:15:08Z

src/libstd/ffi/mod.rs

+//! so the buffer length is really `len+1` characters.  Rust strings
+//! don't have a nul terminator, and they always know their length.
+//!
+//! * **No nul characters in the middle of the string** - When C


I'd word this as "Internal NULs" as a more succinct version

clarfonthey · 2017-09-27T05:16:16Z

src/libstd/ffi/mod.rs

+//! strings have a nul terminator character, this usually means that
+//! they cannot have nul characters in the middle — a nul character
+//! would essentially truncate the string.  Rust strings *can* have
+//! nul characters in the middle, since they don't use nul


Rather than "don't use nul terminators," it's clearer to say "because NUL doesn't have to mark the end of the string in Rust"

clarfonthey · 2017-09-27T05:17:14Z

src/libstd/ffi/mod.rs

+//! # Representations of non-Rust strings
+//!
+//! [`CString`] and [`CStr`] are useful when you need to transfer
+//! UTF-8 strings to and from C, respectively:


I'd expand this to languages with a C ABI like Python, etc. People should know that a CStr might be necessary when interacting with other languages too.

clarfonthey · 2017-09-27T05:17:33Z

src/libstd/ffi/mod.rs

+//! UTF-8 strings to and from C, respectively:
+//!
+//! * **From Rust to C:** [`CString`] represents an owned, C-friendly
+//! UTF-8 string:  it is valid UTF-8, it is nul-terminated, and has no


Not always valid UTF-8.

clarfonthey · 2017-09-27T05:18:03Z

src/libstd/ffi/mod.rs

+//!
+//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
+//! is what you would use to wrap a raw `*const u8` that you got from
+//! a C function.  A `CStr` is just guaranteed to be a nul-terminated


"just" seems out of place here; I'd remove it.

clarfonthey · 2017-09-27T05:19:32Z

src/libstd/ffi/mod.rs

+//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
+//! is what you would use to wrap a raw `*const u8` that you got from
+//! a C function.  A `CStr` is just guaranteed to be a nul-terminated
+//! array of bytes; the UTF-8 validation step only happens when you


"the UTF-8 validation step" is only just mentioned here so I'd just make a separate sentence describing how that works instead, along the lines of "once you have a CStr, you can convert it to a Rust str if it's valid UTF-8, or lossily convert it by adding replacement characters"

clarfonthey · 2017-09-27T05:21:39Z

src/libstd/ffi/mod.rs

+//! request to convert it to a `&str`.
+//!
+//! [`OsString`] and [`OsStr`] are useful when you need to transfer
+//! strings to and from operating system calls.  If you need Rust


A lot of programmers may not know what system calls are; I'd probably word this as "the operating system itself."

It may also make sense to include examples where this happens, like in opening files and running external commands.

I feel like the "If you need Rust strings out of them [...]" section is kind of redundant and wordy. I'd probably just say that conversions between OsStr and str work very similarly to CStr and leave it at that.

clarfonthey · 2017-09-27T05:27:04Z

Great work! I've interacted a lot with CStr and OsStr so I added some comments on ways that I think the docs could be made clearer. Hopefully it's more helpful than overwhelming ><

federicomenaquintero · 2017-10-02T19:18:32Z

I've integrated the changes per your comments. How's it look now? :)

clarfonthey · 2017-10-02T20:47:26Z

Looks good to me! Again, great work! :)

federicomenaquintero · 2017-10-02T20:57:21Z

Thank you!

shepmaster · 2017-10-06T18:28:21Z

Poke @aturon — this is now ready for your masterful reviewing skills!

carols10cents · 2017-10-09T14:45:11Z

Actually, @aturon wasn't available last week and is on PTO this week, so let's try....

r? @steveklabnik

steveklabnik

This is fantastic, thank you so much!

I have a few little formatting nits, but after that, let's get this merged!

steveklabnik · 2017-10-11T15:29:44Z

src/libstd/ffi/c_str.rs

@@ -149,8 +209,13 @@ pub struct CStr {
 }

 /// An error returned from [`CString::new`] to indicate that a nul byte was found
-/// in the vector provided.
+/// in the vector provided.  While Rust strings may contain nul bytes in the middle,
+/// C strings can't, as that byte would effectively truncate the string.


Could we change this up a bit? We try to have a summary sentence first, then the rest of it. This one has a long summary, and repeats itself since you added the information below. How about:

/// An error indicating that an interior nul byte was found. /// /// While Rust strings may contain nul bytes in the middle, C strings can't, as that byte would effectively /// truncate the string. /// /// This `struct`....

with the correct wrapping, I just guessed here. What do you think?

steveklabnik · 2017-10-11T15:30:13Z

src/libstd/ffi/c_str.rs

+/// that a nul byte was found too early in the slice provided, or one
+/// wasn't found at all for the nul terminator.  The slice used to
+/// create a `CStr` must have one and only one nul byte at the end of
+/// the slice.


Same thing here; don't repeat where it came from, make sure to have a short summary, some space, and then a longer description.

steveklabnik · 2017-10-11T15:30:21Z

src/libstd/ffi/c_str.rs

+/// UTF-8 error was encountered during the conversion.  `CString` is
+/// just a wrapper over a buffer of bytes with a nul terminator;
+/// [`into_string`][`CString::into_string`] performs UTF-8 validation
+/// and may return this error.


steveklabnik · 2017-10-11T15:30:47Z

src/libstd/ffi/c_str.rs

+    /// underlying bytes to construct a new string, ensuring that
+    /// there is a trailing 0 byte.  This trailing 0 byte will be
+    /// appended by this method; the provided data should *not*
+    /// contain any 0 bytes in it.


this isn't a method; could you say "function" instead?

steveklabnik · 2017-10-11T15:32:06Z

src/libstd/ffi/mod.rs

@@ -8,7 +8,156 @@
 // option. This file may not be copied, modified, or distributed
 // except according to those terms.

-//! Utilities related to FFI bindings.
+//! This module provides utilities to handle data across non-Rust


I'd keep this short summary, but with a newline between it, so you get the summary. That is:

///! Utilities related to FFI bindings. //! //! This module provides utilities....

steveklabnik · 2017-10-11T15:32:48Z

src/libstd/ffi/mod.rs

+//! C strings are different from Rust strings:
+//!
+//! * **Encodings** - Rust strings are UTF-8, but C strings may use
+//! other encodings.  If you are using a string from C, you should


one space after a period, not two please!

steveklabnik · 2017-10-11T15:33:04Z

src/libstd/ffi/mod.rs

+//! characters; please **note** that C's `char` is different from Rust's.
+//! The C standard leaves the actual sizes of those types open to
+//! interpretation, but defines different APIs for strings made up of
+//! each character type.  Rust strings are always UTF-8, so different


and here, and everywhere 😄

… the beginning Per #44855 (comment) and subsequent ones.

Per #44855 (comment)

steveklabnik · 2017-10-12T12:47:03Z

Thanks! @bors: r+ rollup

bors · 2017-10-12T12:47:03Z

📌 Commit 5fb8e3d has been approved by steveklabnik

…eklabnik Improved docs for CStr, CString, OsStr, OsString This expands the documentation for those structs and their corresponding traits, per rust-lang#29354

Rollup of 14 pull requests - Successful merges: #44855, #45110, #45122, #45133, #45173, #45178, #45189, #45203, #45209, #45221, #45236, #45240, #45245, #45253 - Failed merges:

@federicomenaquintero

rust-lang/rust#44855 should be attributed to @federicomenaquintero

federicomenaquintero added 6 commits September 25, 2017 13:51

Expand the introduction to the ffi module.

5451b72

We describe the representation of C strings, and the purpose of OsString/OsStr. Part of #29354

Overhaul the ffi::CString docs

8da694a

Explain the struct's reason for being, and its most common usage patterns. Add a bunch of links. Clarify the method docs a bit. Part of #29354

Overhaul the ffi::CStr documentation.

2cb2a06

Point from the error structs back to the method that created them, li…

3c5e18f

…ke in iterators

Module overview for std::os::windows:ffi

155b4b1

Overhaul the documentation for OsString / OsStr

91f6445

rust-highfive assigned aturon Sep 26, 2017

os_str: Fix too-long lines

4143422

arielb1 added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 26, 2017

clarfonthey reviewed Sep 27, 2017

View reviewed changes

federicomenaquintero added 2 commits October 2, 2017 13:53

Remove the implication that CString contains UTF-8 data.

9854e83

Clarify the ffi module's toplevel docs, per @clarcharr's comments

50505aa

Fix broken links in documentation

d989cd0

shepmaster added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 6, 2017

rust-highfive assigned steveklabnik and unassigned aturon Oct 9, 2017

steveklabnik suggested changes Oct 11, 2017

View reviewed changes

federicomenaquintero added 5 commits October 11, 2017 17:51

ffi/c_str.rs: Make all descriptions have a single-sentence summary at…

d5bdfbc

… the beginning Per #44855 (comment) and subsequent ones.

ffi/c_str.rs: Fix method/function confusion

a9a4ce6

Per #44855 (comment)

ffi/c_str.rs: Use only one space after a period ending a sentence

0264510

ffi/mod.rs: Keep the one-sentence summary at the beginning of the module

c8e232d

ffi/mod.rs: Use only one space after a period ending a sentence

5fb8e3d

steveklabnik approved these changes Oct 12, 2017

View reviewed changes

kennytm mentioned this pull request Oct 13, 2017

Rollup of 14 pull requests #45261

Merged

bors merged commit 5fb8e3d into rust-lang:master Oct 13, 2017

GuillaumeGomez added a commit to GuillaumeGomez/this-week-in-rust-docs that referenced this pull request Oct 15, 2017

Merge pull request #118 from kennytm/patch-1

fe83f74

rust-lang/rust#44855 should be attributed to @federicomenaquintero

Improved docs for CStr, CString, OsStr, OsString #44855

Improved docs for CStr, CString, OsStr, OsString #44855

Conversation

federicomenaquintero commented Sep 26, 2017

rust-highfive commented Sep 26, 2017

cuviper commented Sep 26, 2017

federicomenaquintero commented Sep 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarfonthey Sep 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarfonthey commented Sep 27, 2017

federicomenaquintero commented Oct 2, 2017

clarfonthey commented Oct 2, 2017

federicomenaquintero commented Oct 2, 2017

shepmaster commented Oct 6, 2017

carols10cents commented Oct 9, 2017

steveklabnik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveklabnik commented Oct 12, 2017

bors commented Oct 12, 2017

clarfonthey Sep 27, 2017 •

edited

Loading