From 5451b722b0d564e8e376ef89920de5d97b01eac3 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Fri, 22 Sep 2017 17:31:18 -0500 Subject: [PATCH 01/15] Expand the introduction to the ffi module. We describe the representation of C strings, and the purpose of OsString/OsStr. Part of https://github.com/rust-lang/rust/issues/29354 --- src/libstd/ffi/mod.rs | 101 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 100 insertions(+), 1 deletion(-) diff --git a/src/libstd/ffi/mod.rs b/src/libstd/ffi/mod.rs index ca1ff18f1cad8..6c8ddfc394496 100644 --- a/src/libstd/ffi/mod.rs +++ b/src/libstd/ffi/mod.rs @@ -8,7 +8,106 @@ // option. This file may not be copied, modified, or distributed // except according to those terms. -//! Utilities related to FFI bindings. +//! This module provides utilities to handle C-like strings. It is +//! mainly of use for FFI (Foreign Function Interface) bindings and +//! code that needs to exchange C-like strings with other languages. +//! +//! # Overview +//! +//! Rust represents owned strings with the [`String`] type, and +//! borrowed slices of strings with the [`str`] primitive. Both are +//! always in UTF-8 encoding, and may contain nul bytes in the middle, +//! i.e. if you look at the bytes that make up the string, there may +//! be a `0` among them. Both `String` and `str` know their length; +//! there are no nul terminators at the end of strings like in C. +//! +//! C strings are different from Rust strings: +//! +//! * **Encodings** - C strings may have different encodings. If +//! you are bringing in strings from C APIs, you should check what +//! encoding you are getting. Rust strings are always UTF-8. +//! +//! * **Character width** - C strings may use "normal" or "wide" +//! characters, i.e. `char` or `wchar_t`, respectively. The C +//! standard leaves the actual sizes of those types open to +//! interpretation, but defines different APIs for strings made up of +//! each character type. Rust strings are always UTF-8, so different +//! Unicode characters will be encoded in a variable number of bytes +//! each. The Rust type [`char`] represents a '[Unicode +//! scalar value]', which is similar to, but not the same as, a +//! '[Unicode code point]'. +//! +//! * **Nul terminators and implicit string lengths** - Often, C +//! strings are nul-terminated, i.e. they have a `0` character at the +//! end. The length of a string buffer is not known *a priori*; +//! instead, to compute the length of a string, C code must manually +//! call a function like `strlen()` for `char`-based strings, or +//! `wcslen()` for `wchar_t`-based ones. Those functions return the +//! number of characters in the string excluding the nul terminator, +//! so the buffer length is really `len+1` characters. Rust strings +//! don't have a nul terminator, and they always know their length. +//! +//! * **No nul characters in the middle of the string** - When C +//! strings have a nul terminator character, this usually means that +//! they cannot have nul characters in the middle — a nul character +//! would essentially truncate the string. Rust strings *can* have +//! nul characters in the middle, since they don't use nul +//! terminators. +//! +//! # Representations of non-Rust strings +//! +//! [`CString`] and [`CStr`] are useful when you need to transfer +//! UTF-8 strings to and from C, respectively: +//! +//! * **From Rust to C:** [`CString`] represents an owned, C-friendly +//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no +//! nul characters in the middle. Rust code can create a `CString` +//! out of a normal string (provided that the string doesn't have nul +//! characters in the middle), and then use a variety of methods to +//! obtain a raw `*mut u8` that can then be passed as an argument to C +//! functions. +//! +//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it +//! is what you would use to wrap a raw `*const u8` that you got from +//! a C function. A `CStr` is just guaranteed to be a nul-terminated +//! array of bytes; the UTF-8 validation step only happens when you +//! request to convert it to a `&str`. +//! +//! [`OsString`] and [`OsStr`] are useful when you need to transfer +//! strings to and from operating system calls. If you need Rust +//! strings out of them, they can take care of conversion to and from +//! the operating system's preferred form for strings — of course, it +//! may not be possible to convert all valid operating system strings +//! into valid UTF-8; the `OsString` and `OsStr` functions let you know +//! when this is the case. +//! +//! * [`OsString`] represents an owned string in whatever +//! representation the operating system prefers. In the Rust standard +//! library, various APIs that transfer strings to/from the operating +//! system use `OsString` instead of plain strings. For example, +//! [`env::var_os()`] is used to query environment variables; it +//! returns an `Option`. If the environment variable exists +//! you will get a `Some(os_string)`, which you can *then* try to +//! convert to a Rust string. This yields a [`Result<>`], so that +//! your code can detect errors in case the environment variable did +//! not in fact contain valid Unicode data. +//! +//! * [`OsStr`] represents a borrowed reference to a string in a format that +//! can be passed to the operating system. It can be converted into +//! an UTF-8 Rust string slice in a similar way to `OsString`. +//! +//! [`String`]: ../string/struct.String.html +//! [`str`]: ../primitive.str.html +//! [`char`]: ../primitive.char.html +//! [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value +//! [Unicode code point]: http://www.unicode.org/glossary/#code_point +//! [`CString`]: struct.CString.html +//! [`CStr`]: struct.CStr.html +//! [`OsString`]: struct.OsString.html +//! [`OsStr`]: struct.OsStr.html +//! [`env::set_var()`]: ../env/fn.set_var.html +//! [`env::var_os()`]: ../env/fn.var_os.html +//! [`Result<>`]: ../result/enum.Result.html #![stable(feature = "rust1", since = "1.0.0")] From 8da694a42138ac74047d989abd3c7daf0edcbe93 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Fri, 22 Sep 2017 20:36:44 -0500 Subject: [PATCH 02/15] Overhaul the ffi::CString docs Explain the struct's reason for being, and its most common usage patterns. Add a bunch of links. Clarify the method docs a bit. Part of https://github.com/rust-lang/rust/issues/29354 --- src/libstd/ffi/c_str.rs | 146 +++++++++++++++++++++++++++++++--------- 1 file changed, 115 insertions(+), 31 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index 7992aefcb4203..f0a691fd6686f 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -23,19 +23,68 @@ use ptr; use slice; use str::{self, Utf8Error}; -/// A type representing an owned C-compatible string. +/// A type representing an owned, C-compatible, UTF-8 string. /// -/// This type serves the primary purpose of being able to safely generate a -/// C-compatible string from a Rust byte slice or vector. An instance of this +/// This type serves the purpose of being able to safely generate a +/// C-compatible UTF-8 string from a Rust byte slice or vector. An instance of this /// type is a static guarantee that the underlying bytes contain no interior 0 -/// bytes and the final byte is 0. +/// bytes ("nul characters") and that the final byte is 0 ("nul terminator"). /// -/// A `CString` is created from either a byte slice or a byte vector. A [`u8`] -/// slice can be obtained with the `as_bytes` method. Slices produced from a -/// `CString` do *not* contain the trailing nul terminator unless otherwise -/// specified. +/// `CString` is to [`CStr`] as [`String`] is to [`&str`]: the former +/// in each pair are owned strings; the latter are borrowed +/// references. /// +/// # Creating a `CString` +/// +/// A `CString` is created from either a byte slice or a byte vector, +/// or anything that implements [`Into`]`<`[`Vec`]`<`[`u8`]`>>` (for +/// example, you can build a `CString` straight out of a [`String`] or +/// a [`&str`], since both implement that trait). +/// +/// The [`new`] method will actually check that the provided `&[u8]` +/// does not have 0 bytes in the middle, and return an error if it +/// finds one. +/// +/// # Extracting a raw pointer to the whole C string +/// +/// `CString` implements a [`as_ptr`] method through the [`Deref`] +/// trait. This method will give you a `*const c_char` which you can +/// feed directly to extern functions that expect a nul-terminated +/// string, like C's `strdup()`. +/// +/// # Extracting a slice of the whole C string +/// +/// Alternatively, you can obtain a `&[`[`u8`]`]` slice from a +/// `CString` with the [`as_bytes`] method. Slices produced in this +/// way do *not* contain the trailing nul terminator. This is useful +/// when you will be calling an extern function that takes a `*const +/// u8` argument which is not necessarily nul-terminated, plus another +/// argument with the length of the string — like C's `strndup()`. +/// You can of course get the slice's length with its +/// [`len`][slice.len] method. +/// +/// If you need a `&[`[`u8`]`]` slice *with* the nul terminator, you +/// can use [`as_bytes_with_nul`] instead. +/// +/// Once you have the kind of slice you need (with or without a nul +/// terminator), you can call the slice's own +/// [`as_ptr`][slice.as_ptr] method to get a raw pointer to pass to +/// extern functions. See the documentation for that function for a +/// discussion on ensuring the lifetime of the raw pointer. +/// +/// [`Into`]: ../convert/trait.Into.html +/// [`Vec`]: ../vec/struct.Vec.html +/// [`String`]: ../string/struct.String.html +/// [`&str`]: ../primitive.str.html /// [`u8`]: ../primitive.u8.html +/// [`new`]: #method.new +/// [`as_bytes`]: #method.as_bytes +/// [`as_bytes_with_nul`]: #method.as_bytes_with_nul +/// [`as_ptr`]: #method.as_ptr +/// [slice.as_ptr]: ../primitive.slice.html#method.as_ptr +/// [slice.len]: ../primitive.slice.html#method.len +/// [`Deref`]: ../ops/trait.Deref.html +/// [`CStr`]: struct.CStr.html /// /// # Examples /// @@ -48,6 +97,8 @@ use str::{self, Utf8Error}; /// fn my_printer(s: *const c_char); /// } /// +/// // We are certain that our string doesn't have 0 bytes in the middle, +/// // so we can .unwrap() /// let c_to_print = CString::new("Hello, world!").unwrap(); /// unsafe { /// my_printer(c_to_print.as_ptr()); @@ -58,7 +109,7 @@ use str::{self, Utf8Error}; /// # Safety /// /// `CString` is intended for working with traditional C-style strings -/// (a sequence of non-null bytes terminated by a single null byte); the +/// (a sequence of non-nul bytes terminated by a single nul byte); the /// primary use case for these kinds of strings is interoperating with C-like /// code. Often you will need to transfer ownership to/from that external /// code. It is strongly recommended that you thoroughly read through the @@ -215,8 +266,11 @@ pub struct IntoStringError { impl CString { /// Creates a new C-compatible string from a container of bytes. /// - /// This method will consume the provided data and use the underlying bytes - /// to construct a new string, ensuring that there is a trailing 0 byte. + /// This method will consume the provided data and use the + /// underlying bytes to construct a new string, ensuring that + /// there is a trailing 0 byte. This trailing 0 byte will be + /// appended by this method; the provided data should *not* + /// contain any 0 bytes in it. /// /// # Examples /// @@ -234,9 +288,11 @@ impl CString { /// /// # Errors /// - /// This function will return an error if the bytes yielded contain an - /// internal 0 byte. The error returned will contain the bytes as well as + /// This function will return an error if the supplied bytes contain an + /// internal 0 byte. The [`NulError`] returned will contain the bytes as well as /// the position of the nul byte. + /// + /// [`NulError`]: struct.NulError.html #[stable(feature = "rust1", since = "1.0.0")] pub fn new>>(t: T) -> Result { Self::_new(t.into()) @@ -249,8 +305,8 @@ impl CString { } } - /// Creates a C-compatible string from a byte vector without checking for - /// interior 0 bytes. + /// Creates a C-compatible string by consuming a byte vector, + /// without checking for interior 0 bytes. /// /// This method is equivalent to [`new`] except that no runtime assertion /// is made that `v` contains no 0 bytes, and it requires an actual @@ -275,7 +331,7 @@ impl CString { CString { inner: v.into_boxed_slice() } } - /// Retakes ownership of a `CString` that was transferred to C. + /// Retakes ownership of a `CString` that was transferred to C via [`into_raw`]. /// /// Additionally, the length of the string will be recalculated from the pointer. /// @@ -286,7 +342,14 @@ impl CString { /// ownership of a string that was allocated by foreign code) is likely to lead /// to undefined behavior or allocator corruption. /// + /// > **Note:** If you need to borrow a string that was allocated by + /// > foreign code, use [`CStr`]. If you need to take ownership of + /// > a string that was allocated by foreign code, you will need to + /// > make your own provisions for freeing it appropriately, likely + /// > with the foreign code's API to do that. + /// /// [`into_raw`]: #method.into_raw + /// [`CStr`]: struct.CStr.html /// /// # Examples /// @@ -315,11 +378,11 @@ impl CString { CString { inner: mem::transmute(slice) } } - /// Transfers ownership of the string to a C caller. + /// Consumes the `CString` and transfers ownership of the string to a C caller. /// - /// The pointer must be returned to Rust and reconstituted using + /// The pointer which this function returns must be returned to Rust and reconstituted using /// [`from_raw`] to be properly deallocated. Specifically, one - /// should *not* use the standard C `free` function to deallocate + /// should *not* use the standard C `free()` function to deallocate /// this string. /// /// Failure to call [`from_raw`] will lead to a memory leak. @@ -356,6 +419,22 @@ impl CString { /// On failure, ownership of the original `CString` is returned. /// /// [`String`]: ../string/struct.String.html + /// + /// # Examples + /// + /// ``` + /// use std::ffi::CString; + /// + /// let valid_utf8 = vec![b'f', b'o', b'o']; + /// let cstring = CString::new(valid_utf8).unwrap(); + /// assert_eq!(cstring.into_string().unwrap(), "foo"); + /// + /// let invalid_utf8 = vec![b'f', 0xff, b'o', b'o']; + /// let cstring = CString::new(invalid_utf8).unwrap(); + /// let err = cstring.into_string().err().unwrap(); + /// assert_eq!(err.utf8_error().valid_up_to(), 1); + /// ``` + #[stable(feature = "cstring_into", since = "1.7.0")] pub fn into_string(self) -> Result { String::from_utf8(self.into_bytes()) @@ -365,10 +444,11 @@ impl CString { }) } - /// Returns the underlying byte buffer. + /// Consumes the `CString` and returns the underlying byte buffer. /// - /// The returned buffer does **not** contain the trailing nul separator and - /// it is guaranteed to not have any interior nul bytes. + /// The returned buffer does **not** contain the trailing nul + /// terminator, and it is guaranteed to not have any interior nul + /// bytes. /// /// # Examples /// @@ -388,7 +468,7 @@ impl CString { } /// Equivalent to the [`into_bytes`] function except that the returned vector - /// includes the trailing nul byte. + /// includes the trailing nul terminator. /// /// [`into_bytes`]: #method.into_bytes /// @@ -408,8 +488,12 @@ impl CString { /// Returns the contents of this `CString` as a slice of bytes. /// - /// The returned slice does **not** contain the trailing nul separator and - /// it is guaranteed to not have any interior nul bytes. + /// The returned slice does **not** contain the trailing nul + /// terminator, and it is guaranteed to not have any interior nul + /// bytes. If you need the nul terminator, use + /// [`as_bytes_with_nul`] instead. + /// + /// [`as_bytes_with_nul`]: #method.as_bytes_with_nul /// /// # Examples /// @@ -427,7 +511,7 @@ impl CString { } /// Equivalent to the [`as_bytes`] function except that the returned slice - /// includes the trailing nul byte. + /// includes the trailing nul terminator. /// /// [`as_bytes`]: #method.as_bytes /// @@ -598,8 +682,8 @@ impl Default for Box { } impl NulError { - /// Returns the position of the nul byte in the slice that was provided to - /// [`CString::new`]. + /// Returns the position of the nul byte in the slice that caused + /// [`CString::new`] to fail. /// /// [`CString::new`]: struct.CString.html#method.new /// @@ -766,7 +850,7 @@ impl CStr { /// assert!(cstr.is_ok()); /// ``` /// - /// Creating a `CStr` without a trailing nul byte is an error: + /// Creating a `CStr` without a trailing nul terminator is an error: /// /// ``` /// use std::ffi::CStr; @@ -869,7 +953,7 @@ impl CStr { /// requires a linear amount of work to be done) and then return the /// resulting slice of `u8` elements. /// - /// The returned slice will **not** contain the trailing nul that this C + /// The returned slice will **not** contain the trailing nul terminator that this C /// string has. /// /// > **Note**: This method is currently implemented as a 0-cost cast, but @@ -894,7 +978,7 @@ impl CStr { /// Converts this C string to a byte slice containing the trailing 0 byte. /// /// This function is the equivalent of [`to_bytes`] except that it will retain - /// the trailing nul instead of chopping it off. + /// the trailing nul terminator instead of chopping it off. /// /// > **Note**: This method is currently implemented as a 0-cost cast, but /// > it is planned to alter its definition in the future to perform the From 2cb2a0606a47a3e2b7777ef099692c735d772b32 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Sun, 24 Sep 2017 20:12:51 -0500 Subject: [PATCH 03/15] Overhaul the ffi::CStr documentation. --- src/libstd/ffi/c_str.rs | 76 +++++++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 30 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index f0a691fd6686f..a10d0a4214bdf 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -128,17 +128,21 @@ pub struct CString { /// Representation of a borrowed C string. /// -/// This dynamically sized type is only safely constructed via a borrowed -/// version of an instance of `CString`. This type can be constructed from a raw -/// C string as well and represents a C string borrowed from another location. +/// This type represents a borrowed reference to a nul-terminated +/// array of bytes. It can be constructed safely from a `&[`[`u8`]`]` +/// slice, or unsafely from a raw `*const c_char`. It can then be +/// converted to a Rust [`&str`] by performing UTF-8 validation, or +/// into an owned [`CString`]. +/// +/// `CStr` is to [`CString`] as [`&str`] is to [`String`]: the former +/// in each pair are borrowed references; the latter are owned +/// strings. /// /// Note that this structure is **not** `repr(C)` and is not recommended to be -/// placed in the signatures of FFI functions. Instead safe wrappers of FFI +/// placed in the signatures of FFI functions. Instead, safe wrappers of FFI /// functions may leverage the unsafe [`from_ptr`] constructor to provide a safe /// interface to other consumers. /// -/// [`from_ptr`]: #method.from_ptr -/// /// # Examples /// /// Inspecting a foreign C string: @@ -151,7 +155,7 @@ pub struct CString { /// /// unsafe { /// let slice = CStr::from_ptr(my_string()); -/// println!("string length: {}", slice.to_bytes().len()); +/// println!("string buffer size without nul terminator: {}", slice.to_bytes().len()); /// } /// ``` /// @@ -173,8 +177,6 @@ pub struct CString { /// /// Converting a foreign C string into a Rust [`String`]: /// -/// [`String`]: ../string/struct.String.html -/// /// ```no_run /// use std::ffi::CStr; /// use std::os::raw::c_char; @@ -189,6 +191,12 @@ pub struct CString { /// /// println!("string: {}", my_string_safe()); /// ``` +/// +/// [`u8`]: ../primitive.u8.html +/// [`&str`]: ../primitive.str.html +/// [`String`]: ../string/struct.String.html +/// [`CString`]: struct.CString.html +/// [`from_ptr`]: #method.from_ptr #[derive(Hash)] #[stable(feature = "rust1", since = "1.0.0")] pub struct CStr { @@ -215,8 +223,10 @@ pub struct CStr { #[stable(feature = "rust1", since = "1.0.0")] pub struct NulError(usize, Vec); -/// An error returned from [`CStr::from_bytes_with_nul`] to indicate that a nul -/// byte was found too early in the slice provided or one wasn't found at all. +/// An error returned from [`CStr::from_bytes_with_nul`] to indicate +/// that a nul byte was found too early in the slice provided, or one +/// wasn't found at all. The slice used to create a `CStr` must have one +/// and only one nul byte at the end of the slice. /// /// [`CStr::from_bytes_with_nul`]: struct.CStr.html#method.from_bytes_with_nul /// @@ -795,9 +805,9 @@ impl fmt::Display for IntoStringError { } impl CStr { - /// Casts a raw C string to a safe C string wrapper. + /// Wraps a raw C string with a safe C string wrapper. /// - /// This function will cast the provided `ptr` to the `CStr` wrapper which + /// This function will wrap the provided `ptr` with a `CStr` wrapper, which /// allows inspection and interoperation of non-owned C strings. This method /// is unsafe for a number of reasons: /// @@ -837,9 +847,9 @@ impl CStr { /// Creates a C string wrapper from a byte slice. /// - /// This function will cast the provided `bytes` to a `CStr` wrapper after - /// ensuring that it is null terminated and does not contain any interior - /// nul bytes. + /// This function will cast the provided `bytes` to a `CStr` + /// wrapper after ensuring that the byte slice is nul-terminated + /// and does not contain any interior nul bytes. /// /// # Examples /// @@ -884,7 +894,7 @@ impl CStr { /// Unsafely creates a C string wrapper from a byte slice. /// /// This function will cast the provided `bytes` to a `CStr` wrapper without - /// performing any sanity checks. The provided slice must be null terminated + /// performing any sanity checks. The provided slice **must** be nul-terminated /// and not contain any interior nul bytes. /// /// # Examples @@ -906,7 +916,7 @@ impl CStr { /// Returns the inner pointer to this C string. /// - /// The returned pointer will be valid for as long as `self` is and points + /// The returned pointer will be valid for as long as `self` is, and points /// to a contiguous region of memory terminated with a 0 byte to represent /// the end of the string. /// @@ -927,9 +937,9 @@ impl CStr { /// ``` /// /// This happens because the pointer returned by `as_ptr` does not carry any - /// lifetime information and the string is deallocated immediately after + /// lifetime information and the [`CString`] is deallocated immediately after /// the `CString::new("Hello").unwrap().as_ptr()` expression is evaluated. - /// To fix the problem, bind the string to a local variable: + /// To fix the problem, bind the `CString` to a local variable: /// /// ```no_run /// use std::ffi::{CString}; @@ -941,6 +951,11 @@ impl CStr { /// *ptr; /// } /// ``` + /// + /// This way, the lifetime of the `CString` in `hello` encompasses + /// the lifetime of `ptr` and the `unsafe` block. + /// + /// [`CString`]: struct.CString.html #[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn as_ptr(&self) -> *const c_char { @@ -949,10 +964,6 @@ impl CStr { /// Converts this C string to a byte slice. /// - /// This function will calculate the length of this string (which normally - /// requires a linear amount of work to be done) and then return the - /// resulting slice of `u8` elements. - /// /// The returned slice will **not** contain the trailing nul terminator that this C /// string has. /// @@ -1002,8 +1013,9 @@ impl CStr { /// Yields a [`&str`] slice if the `CStr` contains valid UTF-8. /// - /// This function will calculate the length of this string and check for - /// UTF-8 validity, and then return the [`&str`] if it's valid. + /// If the contents of the `CStr` are valid UTF-8 data, this + /// function will return the corresponding [`&str`] slice. Otherwise, + /// it will return an error with details of where UTF-8 validation failed. /// /// > **Note**: This method is currently implemented to check for validity /// > after a 0-cost cast, but it is planned to alter its definition in the @@ -1031,10 +1043,12 @@ impl CStr { /// Converts a `CStr` into a [`Cow`]`<`[`str`]`>`. /// - /// This function will calculate the length of this string (which normally - /// requires a linear amount of work to be done) and then return the - /// resulting slice as a [`Cow`]`<`[`str`]`>`, replacing any invalid UTF-8 sequences - /// with `U+FFFD REPLACEMENT CHARACTER`. + /// If the contents of the `CStr` are valid UTF-8 data, this + /// function will return a [`Cow`]`::`[`Borrowed`]`(`[`&str`]`)` + /// with the the corresponding [`&str`] slice. Otherwise, it will + /// replace any invalid UTF-8 sequences with `U+FFFD REPLACEMENT + /// CHARACTER` and return a [`Cow`]`::`[`Owned`]`(`[`String`]`)` + /// with the result. /// /// > **Note**: This method is currently implemented to check for validity /// > after a 0-cost cast, but it is planned to alter its definition in the @@ -1042,7 +1056,9 @@ impl CStr { /// > check whenever this method is called. /// /// [`Cow`]: ../borrow/enum.Cow.html + /// [`Borrowed`]: ../borrow/enum.Cow.html#variant.Borrowed /// [`str`]: ../primitive.str.html + /// [`String`]: ../string/struct.String.html /// /// # Examples /// From 3c5e18f322818f54cd9764031979f29b89a3e80d Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Mon, 25 Sep 2017 10:53:13 -0500 Subject: [PATCH 04/15] Point from the error structs back to the method that created them, like in iterators --- src/libstd/ffi/c_str.rs | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index a10d0a4214bdf..01d2b70e42377 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -208,8 +208,13 @@ pub struct CStr { } /// An error returned from [`CString::new`] to indicate that a nul byte was found -/// in the vector provided. +/// in the vector provided. While Rust strings may contain nul bytes in the middle, +/// C strings can't, as that byte would effectively truncate the string. /// +/// This `struct` is created by the [`new`][`CString::new`] method on +/// [`CString`]. See its documentation for more. +/// +/// [`CString`]: struct.CString.html /// [`CString::new`]: struct.CString.html#method.new /// /// # Examples @@ -225,9 +230,15 @@ pub struct NulError(usize, Vec); /// An error returned from [`CStr::from_bytes_with_nul`] to indicate /// that a nul byte was found too early in the slice provided, or one -/// wasn't found at all. The slice used to create a `CStr` must have one -/// and only one nul byte at the end of the slice. +/// wasn't found at all for the nul terminator. The slice used to +/// create a `CStr` must have one and only one nul byte at the end of +/// the slice. +/// +/// This `struct` is created by the +/// [`from_bytes_with_nul`][`CStr::from_bytes_with_nul`] method on +/// [`CStr`]. See its documentation for more. /// +/// [`CStr`]: struct.CStr.html /// [`CStr::from_bytes_with_nul`]: struct.CStr.html#method.from_bytes_with_nul /// /// # Examples @@ -262,9 +273,17 @@ impl FromBytesWithNulError { } } -/// An error returned from [`CString::into_string`] to indicate that a UTF-8 error -/// was encountered during the conversion. +/// An error returned from [`CString::into_string`] to indicate that a +/// UTF-8 error was encountered during the conversion. `CString` is +/// just a wrapper over a buffer of bytes with a nul terminator; +/// [`into_string`][`CString::into_string`] performs UTF-8 validation +/// and may return this error. +/// +/// This `struct` is created by the +/// [`into_string`][`CString::into_string`] method on [`CString`]. See +/// its documentation for more. /// +/// [`CString`]: struct.CString.html /// [`CString::into_string`]: struct.CString.html#method.into_string #[derive(Clone, PartialEq, Eq, Debug)] #[stable(feature = "cstring_into", since = "1.7.0")] From 155b4b1c5fff6a2a5a87de25e2fbe8c96743efb2 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Mon, 25 Sep 2017 12:51:11 -0500 Subject: [PATCH 05/15] Module overview for std::os::windows:ffi --- src/libstd/ffi/mod.rs | 2 +- src/libstd/sys/windows/ext/ffi.rs | 56 +++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/src/libstd/ffi/mod.rs b/src/libstd/ffi/mod.rs index 6c8ddfc394496..376e13e3b034b 100644 --- a/src/libstd/ffi/mod.rs +++ b/src/libstd/ffi/mod.rs @@ -92,7 +92,7 @@ //! your code can detect errors in case the environment variable did //! not in fact contain valid Unicode data. //! -//! * [`OsStr`] represents a borrowed reference to a string in a format that +//! * [`OsStr`] represents a borrowed string slice in a format that //! can be passed to the operating system. It can be converted into //! an UTF-8 Rust string slice in a similar way to `OsString`. //! diff --git a/src/libstd/sys/windows/ext/ffi.rs b/src/libstd/sys/windows/ext/ffi.rs index 3f6c2827a3f93..d6b8896ac096d 100644 --- a/src/libstd/sys/windows/ext/ffi.rs +++ b/src/libstd/sys/windows/ext/ffi.rs @@ -9,6 +9,62 @@ // except according to those terms. //! Windows-specific extensions to the primitives in the `std::ffi` module. +//! +//! # Overview +//! +//! For historical reasons, the Windows API uses a form of potentially +//! ill-formed UTF-16 encoding for strings. Specifically, the 16-bit +//! code units in Windows strings may contain [isolated surrogate code +//! points which are not paired together][ill-formed-utf-16]. The +//! Unicode standard requires that surrogate code points (those in the +//! range U+D800 to U+DFFF) always be *paired*, because in the UTF-16 +//! encoding a *surrogate code unit pair* is used to encode a single +//! character. For compatibility with code that does not enforce +//! these pairings, Windows does not enforce them, either. +//! +//! While it is not always possible to convert such a string losslessly into +//! a valid UTF-16 string (or even UTF-8), it is often desirable to be +//! able to round-trip such a string from and to Windows APIs +//! losslessly. For example, some Rust code may be "bridging" some +//! Windows APIs together, just passing `WCHAR` strings among those +//! APIs without ever really looking into the strings. +//! +//! If Rust code *does* need to look into those strings, it can +//! convert them to valid UTF-8, possibly lossily, by substituting +//! invalid sequences with U+FFFD REPLACEMENT CHARACTER, as is +//! conventionally done in other Rust APIs that deal with string +//! encodings. +//! +//! # `OsStringExt` and `OsStrExt` +//! +//! [`OsString`] is the Rust wrapper for owned strings in the +//! preferred representation of the operating system. On Windows, +//! this struct gets augmented with an implementation of the +//! [`OsStringExt`] trait, which has a [`from_wide`] method. This +//! lets you create an [`OsString`] from a `&[u16]` slice; presumably +//! you get such a slice out of a `WCHAR` Windows API. +//! +//! Similarly, [`OsStr`] is the Rust wrapper for borrowed strings from +//! preferred representation of the operating system. On Windows, the +//! [`OsStrExt`] trait provides the [`encode_wide`] method, which +//! outputs an [`EncodeWide`] iterator. You can [`collect`] this +//! iterator, for example, to obtain a `Vec`; you can later get a +//! pointer to this vector's contents and feed it to Windows APIs. +//! +//! These traits, along with [`OsString`] and [`OsStr`], work in +//! conjunction so that it is possible to **round-trip** strings from +//! Windows and back, with no loss of data, even if the strings are +//! ill-formed UTF-16. +//! +//! [ill-formed-utf-16]: https://simonsapin.github.io/wtf-8/#ill-formed-utf-16 +//! [`OsString`]: ../../../ffi/struct.OsString.html +//! [`OsStr`]: ../../../ffi/struct.OsStr.html +//! [`OsStringExt`]: trait.OsStringExt.html +//! [`OsStrExt`]: trait.OsStrExt.html +//! [`EncodeWide`]: struct.EncodeWide.html +//! [`from_wide`]: trait.OsStringExt.html#tymethod.from_wide +//! [`encode_wide`]: trait.OsStrExt.html#tymethod.encode_wide +//! [`collect`]: ../../../iter/trait.Iterator.html#method.collect #![stable(feature = "rust1", since = "1.0.0")] From 91f6445b5956aff72755b84854a19d2921009e1e Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Mon, 25 Sep 2017 13:51:28 -0500 Subject: [PATCH 06/15] Overhaul the documentation for OsString / OsStr --- src/libstd/ffi/mod.rs | 45 ++++++++++++++++++++++++++++++++--- src/libstd/ffi/os_str.rs | 51 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 91 insertions(+), 5 deletions(-) diff --git a/src/libstd/ffi/mod.rs b/src/libstd/ffi/mod.rs index 376e13e3b034b..1214a2406e2d0 100644 --- a/src/libstd/ffi/mod.rs +++ b/src/libstd/ffi/mod.rs @@ -92,13 +92,40 @@ //! your code can detect errors in case the environment variable did //! not in fact contain valid Unicode data. //! -//! * [`OsStr`] represents a borrowed string slice in a format that -//! can be passed to the operating system. It can be converted into -//! an UTF-8 Rust string slice in a similar way to `OsString`. +//! * [`OsStr`] represents a borrowed reference to a string in a +//! format that can be passed to the operating system. It can be +//! converted into an UTF-8 Rust string slice in a similar way to +//! `OsString`. +//! +//! # Conversions +//! +//! ## On Unix +//! +//! On Unix, [`OsStr`] implements the `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which +//! augments it with two methods, [`from_bytes`] and [`as_bytes`]. These do inexpensive conversions +//! from and to UTF-8 byte slices. +//! +//! Additionally, on Unix [`OsString`] implements the +//! `std::os::unix:ffi::`[`OsStringExt`][unix.OsStringExt] trait, +//! which provides [`from_vec`] and [`into_vec`] methods that consume +//! their arguments, and take or produce vectors of [`u8`]. +//! +//! ## On Windows +//! +//! On Windows, [`OsStr`] implements the `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] +//! trait, which provides an [`encode_wide`] method. This provides an iterator that can be +//! [`collect`]ed into a vector of [`u16`]. +//! +//! Additionally, on Windows [`OsString`] implements the +//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] trait, which provides a +//! [`from_wide`] method. The result of this method is an `OsString` which can be round-tripped to +//! a Windows string losslessly. //! //! [`String`]: ../string/struct.String.html //! [`str`]: ../primitive.str.html //! [`char`]: ../primitive.char.html +//! [`u8`]: ../primitive.u8.html +//! [`u16`]: ../primitive.u16.html //! [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value //! [Unicode code point]: http://www.unicode.org/glossary/#code_point //! [`CString`]: struct.CString.html @@ -108,6 +135,18 @@ //! [`env::set_var()`]: ../env/fn.set_var.html //! [`env::var_os()`]: ../env/fn.var_os.html //! [`Result<>`]: ../result/enum.Result.html +//! [unix.OsStringExt]: ../os/unix/ffi/trait.OsStringExt.html +//! [`from_vec`]: ../os/unix/ffi/trait.OsStringExt.html#tymethod.from_vec +//! [`into_vec`]: ../os/unix/ffi/trait.OsStringExt.html#tymethod.into_vec +//! [unix.OsStrExt]: ../os/unix/ffi/trait.OsStrExt.html +//! [`from_bytes`]: ../os/unix/ffi/trait.OsStrExt.html#tymethod.from_bytes +//! [`as_bytes`]: ../os/unix/ffi/trait.OsStrExt.html#tymethod.as_bytes +//! [`OsStrExt`]: ../os/unix/ffi/trait.OsStrExt.html +//! [windows.OsStrExt]: ../os/windows/ffi/trait.OsStrExt.html +//! [`encode_wide`]: ../os/windows/ffi/trait.OsStrExt.html#tymethod.encode_wide +//! [`collect`]: ../iter/trait.Iterator.html#method.collect +//! [windows.OsStringExt]: ../os/windows/ffi/trait.OsStringExt.html +//! [`from_wide`]: ../os/windows/ffi/trait.OsStringExt.html#tymethod.from_wide #![stable(feature = "rust1", since = "1.0.0")] diff --git a/src/libstd/ffi/os_str.rs b/src/libstd/ffi/os_str.rs index a40a9329ed9bf..72b0e68a9b656 100644 --- a/src/libstd/ffi/os_str.rs +++ b/src/libstd/ffi/os_str.rs @@ -33,18 +33,65 @@ use sys_common::{AsInner, IntoInner, FromInner}; /// /// `OsString` and [`OsStr`] bridge this gap by simultaneously representing Rust /// and platform-native string values, and in particular allowing a Rust string -/// to be converted into an "OS" string with no cost. +/// to be converted into an "OS" string with no cost if possible. +/// +/// `OsString` is to [`OsStr`] as [`String`] is to [`&str`]: the former +/// in each pair are owned strings; the latter are borrowed +/// references. +/// +/// # Creating an `OsString` +/// +/// **From a Rust string**: `OsString` implements +/// [`From`]`<`[`String`]`>`, so you can use `my_string.`[`from`] to +/// create an `OsString` from a normal Rust string. +/// +/// **From slices:** Just like you can start with an empty Rust +/// [`String`] and then [`push_str`][String.push_str] `&str` +/// sub-string slices into it, you can create an empty `OsString` with +/// the [`new`] method and then push string slices into it with the +/// [`push`] method. +/// +/// # Extracting a borrowed reference to the whole OS string +/// +/// You can use the [`as_os_str`] method to get an `&`[`OsStr`] from +/// an `OsString`; this is effectively a borrowed reference to the +/// whole string. +/// +/// # Conversions +/// +/// See the [module's toplevel documentation about conversions][conversions] for a discussion on the traits which +/// `OsString` implements for conversions from/to native representations. /// /// [`OsStr`]: struct.OsStr.html +/// [`From`]: ../convert/trait.From.html +/// [`from`]: ../convert/trait.From.html#tymethod.from +/// [`String`]: ../string/struct.String.html +/// [`&str`]: ../primitive.str.html +/// [`u8`]: ../primitive.u8.html +/// [`u16`]: ../primitive.u16.html +/// [String.push_str]: ../string/struct.String.html#method.push_str +/// [`new`]: #struct.OsString.html#method.new +/// [`push`]: #struct.OsString.html#method.push +/// [`as_os_str`]: #struct.OsString.html#method.as_os_str #[derive(Clone)] #[stable(feature = "rust1", since = "1.0.0")] pub struct OsString { inner: Buf } -/// Slices into OS strings (see [`OsString`]). +/// Borrowed reference to an OS string (see [`OsString`]). +/// +/// This type represents a borrowed reference to a string in the operating system's preferred +/// representation. +/// +/// `OsStr` is to [`OsString`] as [`String`] is to [`&str`]: the former in each pair are borrowed +/// references; the latter are owned strings. +/// +/// See the [module's toplevel documentation about conversions][conversions] for a discussion on the traits which +/// `OsStr` implements for conversions from/to native representations. /// /// [`OsString`]: struct.OsString.html +/// [conversions]: index.html#conversions #[stable(feature = "rust1", since = "1.0.0")] pub struct OsStr { inner: Slice From 4143422981e5be3593d3248e99cd503442aae698 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Tue, 26 Sep 2017 08:56:44 -0500 Subject: [PATCH 07/15] os_str: Fix too-long lines --- src/libstd/ffi/os_str.rs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/libstd/ffi/os_str.rs b/src/libstd/ffi/os_str.rs index 72b0e68a9b656..b6032f7c74c21 100644 --- a/src/libstd/ffi/os_str.rs +++ b/src/libstd/ffi/os_str.rs @@ -59,8 +59,8 @@ use sys_common::{AsInner, IntoInner, FromInner}; /// /// # Conversions /// -/// See the [module's toplevel documentation about conversions][conversions] for a discussion on the traits which -/// `OsString` implements for conversions from/to native representations. +/// See the [module's toplevel documentation about conversions][conversions] for a discussion on +/// the traits which `OsString` implements for conversions from/to native representations. /// /// [`OsStr`]: struct.OsStr.html /// [`From`]: ../convert/trait.From.html @@ -87,8 +87,8 @@ pub struct OsString { /// `OsStr` is to [`OsString`] as [`String`] is to [`&str`]: the former in each pair are borrowed /// references; the latter are owned strings. /// -/// See the [module's toplevel documentation about conversions][conversions] for a discussion on the traits which -/// `OsStr` implements for conversions from/to native representations. +/// See the [module's toplevel documentation about conversions][conversions] for a discussion on +/// the traits which `OsStr` implements for conversions from/to native representations. /// /// [`OsString`]: struct.OsString.html /// [conversions]: index.html#conversions From 9854e836a35c3114c81b8102d3468ff9071b4141 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Mon, 2 Oct 2017 13:53:50 -0500 Subject: [PATCH 08/15] Remove the implication that CString contains UTF-8 data. --- src/libstd/ffi/c_str.rs | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index 01d2b70e42377..0d0280e25861e 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -23,10 +23,11 @@ use ptr; use slice; use str::{self, Utf8Error}; -/// A type representing an owned, C-compatible, UTF-8 string. +/// A type representing an owned, C-compatible, nul-terminated string with no nul bytes in the +/// middle. /// /// This type serves the purpose of being able to safely generate a -/// C-compatible UTF-8 string from a Rust byte slice or vector. An instance of this +/// C-compatible string from a Rust byte slice or vector. An instance of this /// type is a static guarantee that the underlying bytes contain no interior 0 /// bytes ("nul characters") and that the final byte is 0 ("nul terminator"). /// @@ -443,7 +444,7 @@ impl CString { Box::into_raw(self.into_inner()) as *mut c_char } - /// Converts the `CString` into a [`String`] if it contains valid Unicode data. + /// Converts the `CString` into a [`String`] if it contains valid UTF-8 data. /// /// On failure, ownership of the original `CString` is returned. /// From 50505aadbd9314375a56bf397a4a97f0102180ce Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Mon, 2 Oct 2017 14:16:37 -0500 Subject: [PATCH 09/15] Clarify the ffi module's toplevel docs, per @clarcharr's comments --- src/libstd/ffi/mod.rs | 119 +++++++++++++++++++++++------------------- 1 file changed, 65 insertions(+), 54 deletions(-) diff --git a/src/libstd/ffi/mod.rs b/src/libstd/ffi/mod.rs index 1214a2406e2d0..f8a4a904fc55e 100644 --- a/src/libstd/ffi/mod.rs +++ b/src/libstd/ffi/mod.rs @@ -8,9 +8,11 @@ // option. This file may not be copied, modified, or distributed // except according to those terms. -//! This module provides utilities to handle C-like strings. It is -//! mainly of use for FFI (Foreign Function Interface) bindings and -//! code that needs to exchange C-like strings with other languages. +//! This module provides utilities to handle data across non-Rust +//! interfaces, like other programming languages and the underlying +//! operating system. It is mainly of use for FFI (Foreign Function +//! Interface) bindings and code that needs to exchange C-like strings +//! with other languages. //! //! # Overview //! @@ -18,68 +20,74 @@ //! borrowed slices of strings with the [`str`] primitive. Both are //! always in UTF-8 encoding, and may contain nul bytes in the middle, //! i.e. if you look at the bytes that make up the string, there may -//! be a `0` among them. Both `String` and `str` know their length; -//! there are no nul terminators at the end of strings like in C. +//! be a `\0` among them. Both `String` and `str` store their length +//! explicitly; there are no nul terminators at the end of strings +//! like in C. //! //! C strings are different from Rust strings: //! -//! * **Encodings** - C strings may have different encodings. If -//! you are bringing in strings from C APIs, you should check what -//! encoding you are getting. Rust strings are always UTF-8. +//! * **Encodings** - Rust strings are UTF-8, but C strings may use +//! other encodings. If you are using a string from C, you should +//! check its encoding explicitly, rather than just assuming that it +//! is UTF-8 like you can do in Rust. //! -//! * **Character width** - C strings may use "normal" or "wide" -//! characters, i.e. `char` or `wchar_t`, respectively. The C -//! standard leaves the actual sizes of those types open to +//! * **Character size** - C strings may use `char` or `wchar_t`-sized +//! characters; please **note** that C's `char` is different from Rust's. +//! The C standard leaves the actual sizes of those types open to //! interpretation, but defines different APIs for strings made up of //! each character type. Rust strings are always UTF-8, so different //! Unicode characters will be encoded in a variable number of bytes -//! each. The Rust type [`char`] represents a '[Unicode -//! scalar value]', which is similar to, but not the same as, a -//! '[Unicode code point]'. +//! each. The Rust type [`char`] represents a '[Unicode scalar +//! value]', which is similar to, but not the same as, a '[Unicode +//! code point]'. //! //! * **Nul terminators and implicit string lengths** - Often, C -//! strings are nul-terminated, i.e. they have a `0` character at the -//! end. The length of a string buffer is not known *a priori*; -//! instead, to compute the length of a string, C code must manually -//! call a function like `strlen()` for `char`-based strings, or -//! `wcslen()` for `wchar_t`-based ones. Those functions return the -//! number of characters in the string excluding the nul terminator, -//! so the buffer length is really `len+1` characters. Rust strings -//! don't have a nul terminator, and they always know their length. -//! -//! * **No nul characters in the middle of the string** - When C -//! strings have a nul terminator character, this usually means that -//! they cannot have nul characters in the middle — a nul character -//! would essentially truncate the string. Rust strings *can* have -//! nul characters in the middle, since they don't use nul -//! terminators. +//! strings are nul-terminated, i.e. they have a `\0` character at the +//! end. The length of a string buffer is not stored, but has to be +//! calculated; to compute the length of a string, C code must +//! manually call a function like `strlen()` for `char`-based strings, +//! or `wcslen()` for `wchar_t`-based ones. Those functions return +//! the number of characters in the string excluding the nul +//! terminator, so the buffer length is really `len+1` characters. +//! Rust strings don't have a nul terminator; their length is always +//! stored and does not need to be calculated. While in Rust +//! accessing a string's length is a O(1) operation (becasue the +//! length is stored); in C it is an O(length) operation because the +//! length needs to be computed by scanning the string for the nul +//! terminator. +//! +//! * **Internal nul characters** - When C strings have a nul +//! terminator character, this usually means that they cannot have nul +//! characters in the middle — a nul character would essentially +//! truncate the string. Rust strings *can* have nul characters in +//! the middle, because nul does not have to mark the end of the +//! string in Rust. //! //! # Representations of non-Rust strings //! //! [`CString`] and [`CStr`] are useful when you need to transfer -//! UTF-8 strings to and from C, respectively: +//! UTF-8 strings to and from languages with a C ABI, like Python. //! //! * **From Rust to C:** [`CString`] represents an owned, C-friendly -//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no -//! nul characters in the middle. Rust code can create a `CString` -//! out of a normal string (provided that the string doesn't have nul -//! characters in the middle), and then use a variety of methods to -//! obtain a raw `*mut u8` that can then be passed as an argument to C -//! functions. +//! string: it is nul-terminated, and has no internal nul characters. +//! Rust code can create a `CString` out of a normal string (provided +//! that the string doesn't have nul characters in the middle), and +//! then use a variety of methods to obtain a raw `*mut u8` that can +//! then be passed as an argument to functions which use the C +//! conventions for strings. //! //! * **From C to Rust:** [`CStr`] represents a borrowed C string; it //! is what you would use to wrap a raw `*const u8` that you got from -//! a C function. A `CStr` is just guaranteed to be a nul-terminated -//! array of bytes; the UTF-8 validation step only happens when you -//! request to convert it to a `&str`. +//! a C function. A `CStr` is guaranteed to be a nul-terminated array +//! of bytes. Once you have a `CStr`, you can convert it to a Rust +//! `&str` if it's valid UTF-8, or lossily convert it by adding +//! replacement characters. //! //! [`OsString`] and [`OsStr`] are useful when you need to transfer -//! strings to and from operating system calls. If you need Rust -//! strings out of them, they can take care of conversion to and from -//! the operating system's preferred form for strings — of course, it -//! may not be possible to convert all valid operating system strings -//! into valid UTF-8; the `OsString` and `OsStr` functions let you know -//! when this is the case. +//! strings to and from the operating system itself, or when capturing +//! the output of external commands. Conversions between `OsString`, +//! `OsStr` and Rust strings work similarly to those for [`CString`] +//! and [`CStr`]. //! //! * [`OsString`] represents an owned string in whatever //! representation the operating system prefers. In the Rust standard @@ -101,9 +109,10 @@ //! //! ## On Unix //! -//! On Unix, [`OsStr`] implements the `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which -//! augments it with two methods, [`from_bytes`] and [`as_bytes`]. These do inexpensive conversions -//! from and to UTF-8 byte slices. +//! On Unix, [`OsStr`] implements the +//! `std::os::unix:ffi::`[`OsStrExt`][unix.OsStrExt] trait, which +//! augments it with two methods, [`from_bytes`] and [`as_bytes`]. +//! These do inexpensive conversions from and to UTF-8 byte slices. //! //! Additionally, on Unix [`OsString`] implements the //! `std::os::unix:ffi::`[`OsStringExt`][unix.OsStringExt] trait, @@ -112,14 +121,16 @@ //! //! ## On Windows //! -//! On Windows, [`OsStr`] implements the `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] -//! trait, which provides an [`encode_wide`] method. This provides an iterator that can be -//! [`collect`]ed into a vector of [`u16`]. +//! On Windows, [`OsStr`] implements the +//! `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] trait, +//! which provides an [`encode_wide`] method. This provides an +//! iterator that can be [`collect`]ed into a vector of [`u16`]. //! //! Additionally, on Windows [`OsString`] implements the -//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] trait, which provides a -//! [`from_wide`] method. The result of this method is an `OsString` which can be round-tripped to -//! a Windows string losslessly. +//! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] +//! trait, which provides a [`from_wide`] method. The result of this +//! method is an `OsString` which can be round-tripped to a Windows +//! string losslessly. //! //! [`String`]: ../string/struct.String.html //! [`str`]: ../primitive.str.html From d989cd02b56524470cc8721f296add7039821777 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Mon, 2 Oct 2017 15:46:10 -0500 Subject: [PATCH 10/15] Fix broken links in documentation --- src/libstd/ffi/os_str.rs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/libstd/ffi/os_str.rs b/src/libstd/ffi/os_str.rs index b6032f7c74c21..94b6911283b1d 100644 --- a/src/libstd/ffi/os_str.rs +++ b/src/libstd/ffi/os_str.rs @@ -70,9 +70,9 @@ use sys_common::{AsInner, IntoInner, FromInner}; /// [`u8`]: ../primitive.u8.html /// [`u16`]: ../primitive.u16.html /// [String.push_str]: ../string/struct.String.html#method.push_str -/// [`new`]: #struct.OsString.html#method.new -/// [`push`]: #struct.OsString.html#method.push -/// [`as_os_str`]: #struct.OsString.html#method.as_os_str +/// [`new`]: #method.new +/// [`push`]: #method.push +/// [`as_os_str`]: #method.as_os_str #[derive(Clone)] #[stable(feature = "rust1", since = "1.0.0")] pub struct OsString { From d5bdfbced63c3d31b0f55a999cd0beb9de286d01 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Wed, 11 Oct 2017 17:51:37 -0500 Subject: [PATCH 11/15] ffi/c_str.rs: Make all descriptions have a single-sentence summary at the beginning Per https://github.com/rust-lang/rust/pull/44855#discussion_r144048837 and subsequent ones. --- src/libstd/ffi/c_str.rs | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index 0d0280e25861e..6541ad3f87249 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -208,11 +208,12 @@ pub struct CStr { inner: [c_char] } -/// An error returned from [`CString::new`] to indicate that a nul byte was found -/// in the vector provided. While Rust strings may contain nul bytes in the middle, -/// C strings can't, as that byte would effectively truncate the string. +/// An error indicating that an interior nul byte was found. /// -/// This `struct` is created by the [`new`][`CString::new`] method on +/// While Rust strings may contain nul bytes in the middle, C strings +/// can't, as that byte would effectively truncate the string. +/// +/// This error is created by the [`new`][`CString::new`] method on /// [`CString`]. See its documentation for more. /// /// [`CString`]: struct.CString.html @@ -229,13 +230,12 @@ pub struct CStr { #[stable(feature = "rust1", since = "1.0.0")] pub struct NulError(usize, Vec); -/// An error returned from [`CStr::from_bytes_with_nul`] to indicate -/// that a nul byte was found too early in the slice provided, or one -/// wasn't found at all for the nul terminator. The slice used to -/// create a `CStr` must have one and only one nul byte at the end of -/// the slice. +/// An error indicating that a nul byte was not in the expected position. /// -/// This `struct` is created by the +/// The slice used to create a [`CStr`] must have one and only one nul +/// byte at the end of the slice. +/// +/// This error is created by the /// [`from_bytes_with_nul`][`CStr::from_bytes_with_nul`] method on /// [`CStr`]. See its documentation for more. /// @@ -274,16 +274,17 @@ impl FromBytesWithNulError { } } -/// An error returned from [`CString::into_string`] to indicate that a -/// UTF-8 error was encountered during the conversion. `CString` is -/// just a wrapper over a buffer of bytes with a nul terminator; -/// [`into_string`][`CString::into_string`] performs UTF-8 validation -/// and may return this error. +/// An error indicating invalid UTF-8 when converting a [`CString`] into a [`String`]. +/// +/// `CString` is just a wrapper over a buffer of bytes with a nul +/// terminator; [`into_string`][`CString::into_string`] performs UTF-8 +/// validation on those bytes and may return this error. /// /// This `struct` is created by the /// [`into_string`][`CString::into_string`] method on [`CString`]. See /// its documentation for more. /// +/// [`String`]: ../string/struct.String.html /// [`CString`]: struct.CString.html /// [`CString::into_string`]: struct.CString.html#method.into_string #[derive(Clone, PartialEq, Eq, Debug)] From a9a4ce6dcc694e2ea35344aa790a73a5dea573f0 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Wed, 11 Oct 2017 17:52:39 -0500 Subject: [PATCH 12/15] ffi/c_str.rs: Fix method/function confusion Per https://github.com/rust-lang/rust/pull/44855#discussion_r144049179 --- src/libstd/ffi/c_str.rs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index 6541ad3f87249..51a5865d29cc5 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -297,10 +297,10 @@ pub struct IntoStringError { impl CString { /// Creates a new C-compatible string from a container of bytes. /// - /// This method will consume the provided data and use the + /// This function will consume the provided data and use the /// underlying bytes to construct a new string, ensuring that - /// there is a trailing 0 byte. This trailing 0 byte will be - /// appended by this method; the provided data should *not* + /// there is a trailing 0 byte. This trailing 0 byte will be + /// appended by this function; the provided data should *not* /// contain any 0 bytes in it. /// /// # Examples From 026451093dbce9d1d2077d4c2bae3a92d413c203 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Wed, 11 Oct 2017 17:53:13 -0500 Subject: [PATCH 13/15] ffi/c_str.rs: Use only one space after a period ending a sentence --- src/libstd/ffi/c_str.rs | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs index 51a5865d29cc5..7d3313fb9d549 100644 --- a/src/libstd/ffi/c_str.rs +++ b/src/libstd/ffi/c_str.rs @@ -49,7 +49,7 @@ use str::{self, Utf8Error}; /// # Extracting a raw pointer to the whole C string /// /// `CString` implements a [`as_ptr`] method through the [`Deref`] -/// trait. This method will give you a `*const c_char` which you can +/// trait. This method will give you a `*const c_char` which you can /// feed directly to extern functions that expect a nul-terminated /// string, like C's `strdup()`. /// @@ -70,7 +70,7 @@ use str::{self, Utf8Error}; /// Once you have the kind of slice you need (with or without a nul /// terminator), you can call the slice's own /// [`as_ptr`][slice.as_ptr] method to get a raw pointer to pass to -/// extern functions. See the documentation for that function for a +/// extern functions. See the documentation for that function for a /// discussion on ensuring the lifetime of the raw pointer. /// /// [`Into`]: ../convert/trait.Into.html @@ -130,8 +130,8 @@ pub struct CString { /// Representation of a borrowed C string. /// /// This type represents a borrowed reference to a nul-terminated -/// array of bytes. It can be constructed safely from a `&[`[`u8`]`]` -/// slice, or unsafely from a raw `*const c_char`. It can then be +/// array of bytes. It can be constructed safely from a `&[`[`u8`]`]` +/// slice, or unsafely from a raw `*const c_char`. It can then be /// converted to a Rust [`&str`] by performing UTF-8 validation, or /// into an owned [`CString`]. /// @@ -374,7 +374,7 @@ impl CString { /// to undefined behavior or allocator corruption. /// /// > **Note:** If you need to borrow a string that was allocated by - /// > foreign code, use [`CStr`]. If you need to take ownership of + /// > foreign code, use [`CStr`]. If you need to take ownership of /// > a string that was allocated by foreign code, you will need to /// > make your own provisions for freeing it appropriately, likely /// > with the foreign code's API to do that. @@ -521,7 +521,7 @@ impl CString { /// /// The returned slice does **not** contain the trailing nul /// terminator, and it is guaranteed to not have any interior nul - /// bytes. If you need the nul terminator, use + /// bytes. If you need the nul terminator, use /// [`as_bytes_with_nul`] instead. /// /// [`as_bytes_with_nul`]: #method.as_bytes_with_nul @@ -1035,7 +1035,7 @@ impl CStr { /// Yields a [`&str`] slice if the `CStr` contains valid UTF-8. /// /// If the contents of the `CStr` are valid UTF-8 data, this - /// function will return the corresponding [`&str`] slice. Otherwise, + /// function will return the corresponding [`&str`] slice. Otherwise, /// it will return an error with details of where UTF-8 validation failed. /// /// > **Note**: This method is currently implemented to check for validity @@ -1066,7 +1066,7 @@ impl CStr { /// /// If the contents of the `CStr` are valid UTF-8 data, this /// function will return a [`Cow`]`::`[`Borrowed`]`(`[`&str`]`)` - /// with the the corresponding [`&str`] slice. Otherwise, it will + /// with the the corresponding [`&str`] slice. Otherwise, it will /// replace any invalid UTF-8 sequences with `U+FFFD REPLACEMENT /// CHARACTER` and return a [`Cow`]`::`[`Owned`]`(`[`String`]`)` /// with the result. From c8e232dfe83a95cf866c967752634db3ff7a98bb Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Wed, 11 Oct 2017 17:55:01 -0500 Subject: [PATCH 14/15] ffi/mod.rs: Keep the one-sentence summary at the beginning of the module --- src/libstd/ffi/mod.rs | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/libstd/ffi/mod.rs b/src/libstd/ffi/mod.rs index f8a4a904fc55e..6b751904c9f0a 100644 --- a/src/libstd/ffi/mod.rs +++ b/src/libstd/ffi/mod.rs @@ -8,6 +8,8 @@ // option. This file may not be copied, modified, or distributed // except according to those terms. +//! Utilities related to FFI bindings. +//! //! This module provides utilities to handle data across non-Rust //! interfaces, like other programming languages and the underlying //! operating system. It is mainly of use for FFI (Foreign Function From 5fb8e3d829e77643e9c153172fb3a67f85eebe81 Mon Sep 17 00:00:00 2001 From: Federico Mena Quintero Date: Wed, 11 Oct 2017 17:57:56 -0500 Subject: [PATCH 15/15] ffi/mod.rs: Use only one space after a period ending a sentence --- src/libstd/ffi/mod.rs | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/src/libstd/ffi/mod.rs b/src/libstd/ffi/mod.rs index 6b751904c9f0a..a75596351e4cf 100644 --- a/src/libstd/ffi/mod.rs +++ b/src/libstd/ffi/mod.rs @@ -12,24 +12,24 @@ //! //! This module provides utilities to handle data across non-Rust //! interfaces, like other programming languages and the underlying -//! operating system. It is mainly of use for FFI (Foreign Function +//! operating system. It is mainly of use for FFI (Foreign Function //! Interface) bindings and code that needs to exchange C-like strings //! with other languages. //! //! # Overview //! //! Rust represents owned strings with the [`String`] type, and -//! borrowed slices of strings with the [`str`] primitive. Both are +//! borrowed slices of strings with the [`str`] primitive. Both are //! always in UTF-8 encoding, and may contain nul bytes in the middle, //! i.e. if you look at the bytes that make up the string, there may -//! be a `\0` among them. Both `String` and `str` store their length +//! be a `\0` among them. Both `String` and `str` store their length //! explicitly; there are no nul terminators at the end of strings //! like in C. //! //! C strings are different from Rust strings: //! //! * **Encodings** - Rust strings are UTF-8, but C strings may use -//! other encodings. If you are using a string from C, you should +//! other encodings. If you are using a string from C, you should //! check its encoding explicitly, rather than just assuming that it //! is UTF-8 like you can do in Rust. //! @@ -37,22 +37,22 @@ //! characters; please **note** that C's `char` is different from Rust's. //! The C standard leaves the actual sizes of those types open to //! interpretation, but defines different APIs for strings made up of -//! each character type. Rust strings are always UTF-8, so different +//! each character type. Rust strings are always UTF-8, so different //! Unicode characters will be encoded in a variable number of bytes -//! each. The Rust type [`char`] represents a '[Unicode scalar +//! each. The Rust type [`char`] represents a '[Unicode scalar //! value]', which is similar to, but not the same as, a '[Unicode //! code point]'. //! //! * **Nul terminators and implicit string lengths** - Often, C //! strings are nul-terminated, i.e. they have a `\0` character at the -//! end. The length of a string buffer is not stored, but has to be +//! end. The length of a string buffer is not stored, but has to be //! calculated; to compute the length of a string, C code must //! manually call a function like `strlen()` for `char`-based strings, -//! or `wcslen()` for `wchar_t`-based ones. Those functions return +//! or `wcslen()` for `wchar_t`-based ones. Those functions return //! the number of characters in the string excluding the nul //! terminator, so the buffer length is really `len+1` characters. //! Rust strings don't have a nul terminator; their length is always -//! stored and does not need to be calculated. While in Rust +//! stored and does not need to be calculated. While in Rust //! accessing a string's length is a O(1) operation (becasue the //! length is stored); in C it is an O(length) operation because the //! length needs to be computed by scanning the string for the nul @@ -61,7 +61,7 @@ //! * **Internal nul characters** - When C strings have a nul //! terminator character, this usually means that they cannot have nul //! characters in the middle — a nul character would essentially -//! truncate the string. Rust strings *can* have nul characters in +//! truncate the string. Rust strings *can* have nul characters in //! the middle, because nul does not have to mark the end of the //! string in Rust. //! @@ -80,30 +80,30 @@ //! //! * **From C to Rust:** [`CStr`] represents a borrowed C string; it //! is what you would use to wrap a raw `*const u8` that you got from -//! a C function. A `CStr` is guaranteed to be a nul-terminated array -//! of bytes. Once you have a `CStr`, you can convert it to a Rust +//! a C function. A `CStr` is guaranteed to be a nul-terminated array +//! of bytes. Once you have a `CStr`, you can convert it to a Rust //! `&str` if it's valid UTF-8, or lossily convert it by adding //! replacement characters. //! //! [`OsString`] and [`OsStr`] are useful when you need to transfer //! strings to and from the operating system itself, or when capturing -//! the output of external commands. Conversions between `OsString`, +//! the output of external commands. Conversions between `OsString`, //! `OsStr` and Rust strings work similarly to those for [`CString`] //! and [`CStr`]. //! //! * [`OsString`] represents an owned string in whatever -//! representation the operating system prefers. In the Rust standard +//! representation the operating system prefers. In the Rust standard //! library, various APIs that transfer strings to/from the operating -//! system use `OsString` instead of plain strings. For example, +//! system use `OsString` instead of plain strings. For example, //! [`env::var_os()`] is used to query environment variables; it -//! returns an `Option`. If the environment variable exists +//! returns an `Option`. If the environment variable exists //! you will get a `Some(os_string)`, which you can *then* try to -//! convert to a Rust string. This yields a [`Result<>`], so that +//! convert to a Rust string. This yields a [`Result<>`], so that //! your code can detect errors in case the environment variable did //! not in fact contain valid Unicode data. //! //! * [`OsStr`] represents a borrowed reference to a string in a -//! format that can be passed to the operating system. It can be +//! format that can be passed to the operating system. It can be //! converted into an UTF-8 Rust string slice in a similar way to //! `OsString`. //! @@ -125,12 +125,12 @@ //! //! On Windows, [`OsStr`] implements the //! `std::os::windows::ffi::`[`OsStrExt`][windows.OsStrExt] trait, -//! which provides an [`encode_wide`] method. This provides an +//! which provides an [`encode_wide`] method. This provides an //! iterator that can be [`collect`]ed into a vector of [`u16`]. //! //! Additionally, on Windows [`OsString`] implements the //! `std::os::windows:ffi::`[`OsStringExt`][windows.OsStringExt] -//! trait, which provides a [`from_wide`] method. The result of this +//! trait, which provides a [`from_wide`] method. The result of this //! method is an `OsString` which can be round-tripped to a Windows //! string losslessly. //!