diff --git a/README.md b/README.md index 5255c63..3c9d725 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,8 @@ # OsStr Bytes -This crate allows interacting with the data stored by [`OsStr`] and -[`OsString`], without resorting to panics or corruption for invalid UTF-8. -Thus, methods can be used that are already defined on [`[u8]`][slice] and -[`Vec`]. - -Typically, the only way to losslessly construct [`OsStr`] or [`OsString`] from -a byte sequence is to use `OsStr::new(str::from_utf8(bytes)?)`, which requires -the bytes to be valid in UTF-8. However, since this crate makes conversions -directly between the platform encoding and raw bytes, even some strings invalid -in UTF-8 can be converted. +This crate provides additional functionality for [`OsStr`] and [`OsString`], +without resorting to panics or corruption for invalid UTF-8. Thus, familiar +methods from [`str`] and [`String`] can be used. [![GitHub Build Status](https://github.com/dylni/os_str_bytes/workflows/build/badge.svg?branch=master)](https://github.com/dylni/os_str_bytes/actions?query=branch%3Amaster) @@ -97,7 +90,7 @@ in this crate, as defined in [LICENSE-APACHE], shall be licensed according to [COPYRIGHT]: https://github.com/dylni/os_str_bytes/blob/master/COPYRIGHT [documentation]: https://docs.rs/os_str_bytes [LICENSE-APACHE]: https://github.com/dylni/os_str_bytes/blob/master/LICENSE-APACHE -[slice]: https://doc.rust-lang.org/std/primitive.slice.html [`OsStr`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html [`OsString`]: https://doc.rust-lang.org/std/ffi/struct.OsString.html -[`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html +[`str`]: https://doc.rust-lang.org/std/primitive.str.html +[`String`]: https://doc.rust-lang.org/std/string/struct.String.html diff --git a/src/lib.rs b/src/lib.rs index 122785c..66c077b 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -1,47 +1,19 @@ -//! This crate allows interacting with the data stored by [`OsStr`] and +//! This crate provides additional functionality for [`OsStr`] and //! [`OsString`], without resorting to panics or corruption for invalid UTF-8. -//! Thus, methods can be used that are already defined on [`[u8]`][slice] and -//! [`Vec`]. +//! Thus, familiar methods from [`str`] and [`String`] can be used. //! -//! Typically, the only way to losslessly construct [`OsStr`] or [`OsString`] -//! from a byte sequence is to use `OsStr::new(str::from_utf8(bytes)?)`, which -//! requires the bytes to be valid in UTF-8. However, since this crate makes -//! conversions directly between the platform encoding and raw bytes, even some -//! strings invalid in UTF-8 can be converted. +//! # Usage //! -//! In most cases, [`RawOsStr`] and [`RawOsString`] should be used. -//! [`OsStrBytes`] and [`OsStringBytes`] provide lower-level APIs that are -//! easier to misuse. +//! The most important trait included is [`OsStrBytesExt`], which provides +//! methods analagous to those of [`str`] but for [`OsStr`]. These methods will +//! never panic for invalid UTF-8 in a platform string, so they can be used to +//! manipulate [`OsStr`] values with the same simplicity possible for [`str`]. //! -//! # Encoding -//! -//! The encoding of bytes returned or accepted by methods of this crate is -//! intentionally left unspecified. It may vary for different platforms, so -//! defining it would run contrary to the goal of generic string handling. -//! However, the following invariants will always be upheld: -//! -//! - The encoding will be compatible with UTF-8. In particular, splitting an -//! encoded byte sequence by a UTF-8–encoded character always produces -//! other valid byte sequences. They can be re-encoded without error using -//! [`RawOsString::into_os_string`] and similar methods. -//! -//! - All characters valid in platform strings are representable. [`OsStr`] and -//! [`OsString`] can always be losslessly reconstructed from extracted bytes. -//! -//! Note that the chosen encoding may not match how Rust stores these strings -//! internally, which is undocumented. For instance, the result of calling -//! [`OsStr::len`] will not necessarily match the number of bytes this crate -//! uses to represent the same string. -//! -//! Additionally, concatenation may yield unexpected results without a UTF-8 -//! separator. If two platform strings need to be concatenated, the only safe -//! way to do so is using [`OsString::push`]. This limitation also makes it -//! undesirable to use the bytes in interchange. -//! -//! Since this encoding can change between versions and platforms, it should -//! not be used for storage. The standard library provides implementations of -//! [`OsStrExt`] and [`OsStringExt`] for various platforms, which should be -//! preferred for that use case. +//! Additionally, the following wrappers are provided. They are primarily +//! legacy types from when this crate needed to perform more frequent encoding +//! conversions. However, they may be useful for their trait implementations. +//! - [`RawOsStr`] is a wrapper for [`OsStr`]. +//! - [`RawOsString`] is a wrapper for [`OsString`]. //! //! # User Input //! @@ -68,7 +40,7 @@ //! //! - **memchr** - //! Changes the implementation to use crate [memchr] for better performance. -//! This feature is useless when "raw\_os\_str" is disabled. +//! This feature is useless when the "raw\_os\_str" feature is disabled. //! //! For more information, see [`RawOsStr`][memchr complexity]. //! @@ -108,6 +80,8 @@ //! - [`OsStrBytes`] //! - [`OsStringBytes`] //! +//! For more information, see [Encoding Conversions]. +//! //! - **print\_bytes** - //! Provides implementations of [`print_bytes::ToBytes`] for [`RawOsStr`] and //! [`RawOsString`]. @@ -127,12 +101,40 @@ //! crate. Otherwise, backward compatibility would be more difficult to //! maintain for new features. //! -//! # Complexity +//! # Encoding Conversions +//! +//! Methods provided by the "conversions" feature use an intentionally +//! unspecified encoding. It may vary for different platforms, so defining it +//! would run contrary to the goal of generic string handling. However, the +//! following invariants will always be upheld: +//! +//! - The encoding will be compatible with UTF-8. In particular, splitting an +//! encoded byte sequence by a UTF-8–encoded character always produces +//! other valid byte sequences. They can be re-encoded without error using +//! [`RawOsString::into_os_string`] and similar methods. +//! +//! - All characters valid in platform strings are representable. [`OsStr`] and +//! [`OsString`] can always be losslessly reconstructed from extracted bytes. +//! +//! Note that the chosen encoding may not match how [`OsStr`] stores these +//! strings internally, which is undocumented. For instance, the result of +//! calling [`OsStr::len`] will not necessarily match the number of bytes this +//! crate uses to represent the same string. However, unlike the encoding used +//! by [`OsStr`], the encoding used by this crate can be validated safely using +//! the following methods: +//! - [`OsStrBytes::assert_from_raw_bytes`] +//! - [`RawOsStr::assert_cow_from_raw_bytes`] +//! - [`RawOsString::assert_from_raw_vec`] +//! +//! Concatenation may yield unexpected results without a UTF-8 separator. If +//! two platform strings need to be concatenated, the only safe way to do so is +//! using [`OsString::push`]. This limitation also makes it undesirable to use +//! the bytes in interchange. //! -//! Conversion method complexities will vary based on what functionality is -//! available for the platform. At worst, they will all be linear, but some can -//! take constant time. For example, [`RawOsString::into_os_string`] might be -//! able to reuse its allocation. +//! Since this encoding can change between versions and platforms, it should +//! not be used for storage. The standard library provides implementations of +//! [`OsStrExt`] and [`OsStringExt`] for various platforms, which should be +//! preferred for that use case. //! //! # Examples //! @@ -173,6 +175,7 @@ //! [bstr]: https://crates.io/crates/bstr //! [`ByteSlice::to_os_str`]: https://docs.rs/bstr/0.2.12/bstr/trait.ByteSlice.html#method.to_os_str //! [`ByteVec::into_os_string`]: https://docs.rs/bstr/0.2.12/bstr/trait.ByteVec.html#method.into_os_string +//! [Encoding Conversions]: #encoding-conversions //! [memchr complexity]: RawOsStr#complexity //! [memchr]: https://crates.io/crates/memchr //! [`OsStrExt`]: ::std::os::unix::ffi::OsStrExt @@ -313,7 +316,7 @@ if_checked_conversions! { /// On Unix, this error is never returned, but [`OsStrExt`] or /// [`OsStringExt`] should be used instead if that needs to be guaranteed. /// - /// [encoding]: self#encoding + /// [encoding]: self#encoding-conversions /// [`OsStrExt`]: ::std::os::unix::ffi::OsStrExt /// [`OsStringExt`]: ::std::os::unix::ffi::OsStringExt /// [`Result::unwrap`]: ::std::result::Result::unwrap @@ -393,7 +396,7 @@ if_conversions! { /// # Ok::<_, io::Error>(()) /// ``` /// - /// [unspecified encoding]: self#encoding + /// [unspecified encoding]: self#encoding-conversions #[must_use = "method should not be used for validation"] #[track_caller] fn assert_from_raw_bytes<'a, S>(string: S) -> Cow<'a, Self> @@ -453,7 +456,7 @@ if_conversions! { /// assert_eq!(string.as_bytes(), &*os_string.to_raw_bytes()); /// ``` /// - /// [unspecified encoding]: self#encoding + /// [unspecified encoding]: self#encoding-conversions #[must_use] fn to_raw_bytes(&self) -> Cow<'_, [u8]>; } @@ -985,7 +988,7 @@ if_conversions! { /// # Ok::<_, io::Error>(()) /// ``` /// - /// [unspecified encoding]: self#encoding + /// [unspecified encoding]: self#encoding-conversions #[must_use = "method should not be used for validation"] #[track_caller] fn assert_from_raw_vec(string: Vec) -> Self; @@ -1044,7 +1047,7 @@ if_conversions! { /// assert_eq!(string.into_bytes(), os_string.into_raw_vec()); /// ``` /// - /// [unspecified encoding]: self#encoding + /// [unspecified encoding]: self#encoding-conversions #[must_use] fn into_raw_vec(self) -> Vec; } diff --git a/src/raw_str.rs b/src/raw_str.rs index 6086fdc..09c0112 100644 --- a/src/raw_str.rs +++ b/src/raw_str.rs @@ -85,10 +85,6 @@ unsafe impl TransmuteBox for [u8] {} /// implementation and are generally not necessary. However, all indices /// returned by this struct can be used for slicing. /// -/// On Unix, all indices are permitted, to avoid false positives. However, -/// relying on this implementation detail is discouraged. Platform-specific -/// indices are error-prone. -/// /// # Complexity /// /// All searching methods have worst-case multiplicative time complexity (i.e., @@ -102,10 +98,6 @@ unsafe impl TransmuteBox for [u8] {} /// representation is not stable. Transmuting between this type and any other /// causes immediate undefined behavior. /// -/// # Nightly Notes -/// -/// Indices are validated on all platforms. -/// /// [memchr complexity]: memchr::memmem::find#complexity /// [unspecified encoding]: super#encoding #[derive(Eq, Hash, Ord, PartialEq, PartialOrd)] @@ -142,9 +134,6 @@ impl RawOsStr { /// Wraps a string, without copying or encoding conversion. /// - /// This method is much more efficient than [`RawOsStr::new`], since the - /// [encoding] used by this crate is compatible with UTF-8. - /// /// # Examples /// /// ``` @@ -154,8 +143,6 @@ impl RawOsStr { /// let raw = RawOsStr::from_str(string); /// assert_eq!(string, raw); /// ``` - /// - /// [encoding]: super#encoding #[allow(clippy::should_implement_trait)] #[inline] #[must_use] @@ -183,7 +170,7 @@ impl RawOsStr { /// # Ok::<_, io::Error>(()) /// ``` /// - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[allow(clippy::missing_safety_doc)] #[inline] #[must_use] @@ -225,7 +212,7 @@ impl RawOsStr { /// # Ok::<_, io::Error>(()) /// ``` /// - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))] #[inline] #[must_use = "method should not be used for validation"] @@ -292,7 +279,7 @@ impl RawOsStr { /// ``` /// /// [`from_encoded_bytes_unchecked`]: Self::from_encoded_bytes_unchecked - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[inline] #[must_use] pub fn as_encoded_bytes(&self) -> &[u8] { @@ -738,7 +725,7 @@ impl RawOsStr { /// assert_eq!(string.as_bytes(), &*raw.to_raw_bytes()); /// ``` /// - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))] #[inline] #[must_use] @@ -963,11 +950,8 @@ impl ToOwned for RawOsStr { /// [`Cow`]: Cow #[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "raw_os_str")))] pub trait RawOsStrCow<'a>: private::Sealed { - /// Converts a platform-native string back to this representation. - /// - /// # Nightly Notes - /// - /// This method does not require copying or encoding conversion. + /// Converts a platform-native string back to this representation, without + /// copying or encoding conversion. /// /// # Examples /// @@ -987,11 +971,8 @@ pub trait RawOsStrCow<'a>: private::Sealed { #[must_use] fn from_os_str(string: Cow<'a, OsStr>) -> Self; - /// Converts this representation back to a platform-native string. - /// - /// # Nightly Notes - /// - /// This method does not require copying or encoding conversion. + /// Converts this representation back to a platform-native string, without + /// copying or encoding conversion. /// /// # Examples /// @@ -1041,14 +1022,7 @@ impl<'a> RawOsStrCow<'a> for Cow<'a, RawOsStr> { pub struct RawOsString(Vec); impl RawOsString { - /// Converts a platform-native string into a representation that can be - /// more easily manipulated. - /// - /// For more information, see [`RawOsStr::new`]. - /// - /// # Nightly Notes - /// - /// This method does not require copying or encoding conversion. + /// Wraps a platform-native string, without copying or encoding conversion. /// /// # Examples /// @@ -1071,9 +1045,6 @@ impl RawOsString { /// Wraps a string, without copying or encoding conversion. /// - /// This method is much more efficient than [`RawOsString::new`], since the - /// [encoding] used by this crate is compatible with UTF-8. - /// /// # Examples /// /// ``` @@ -1083,8 +1054,6 @@ impl RawOsString { /// let raw = RawOsString::from_string(string.clone()); /// assert_eq!(string, raw); /// ``` - /// - /// [encoding]: super#encoding #[inline] #[must_use] pub fn from_string(string: String) -> Self { @@ -1147,7 +1116,7 @@ impl RawOsString { /// # Ok::<_, io::Error>(()) /// ``` /// - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))] #[inline] #[must_use = "method should not be used for validation"] @@ -1250,18 +1219,15 @@ impl RawOsString { /// ``` /// /// [`from_encoded_vec_unchecked`]: Self::from_encoded_vec_unchecked - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[inline] #[must_use] pub fn into_encoded_vec(self) -> Vec { self.0 } - /// Converts this representation back to a platform-native string. - /// - /// # Nightly Notes - /// - /// This method does not require copying or encoding conversion. + /// Converts this representation back to a platform-native string, without + /// copying or encoding conversion. /// /// # Examples /// @@ -1300,7 +1266,7 @@ impl RawOsString { /// assert_eq!(string.into_bytes(), raw.into_raw_vec()); /// ``` /// - /// [unspecified encoding]: super#encoding + /// [unspecified encoding]: super#encoding-conversions #[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))] #[inline] #[must_use]