Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dylni committed Nov 4, 2023
1 parent 6870f75 commit 81ac84b
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 111 deletions.
17 changes: 5 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,8 @@
# OsStr Bytes

This crate allows interacting with the data stored by [`OsStr`] and
[`OsString`], without resorting to panics or corruption for invalid UTF-8.
Thus, methods can be used that are already defined on [`[u8]`][slice] and
[`Vec<u8>`].

Typically, the only way to losslessly construct [`OsStr`] or [`OsString`] from
a byte sequence is to use `OsStr::new(str::from_utf8(bytes)?)`, which requires
the bytes to be valid in UTF-8. However, since this crate makes conversions
directly between the platform encoding and raw bytes, even some strings invalid
in UTF-8 can be converted.
This crate provides additional functionality for [`OsStr`] and [`OsString`],
without resorting to panics or corruption for invalid UTF-8. Thus, familiar
methods from [`str`] and [`String`] can be used.

[![GitHub Build Status](https://github.com/dylni/os_str_bytes/workflows/build/badge.svg?branch=master)](https://github.com/dylni/os_str_bytes/actions?query=branch%3Amaster)

Expand Down Expand Up @@ -97,7 +90,7 @@ in this crate, as defined in [LICENSE-APACHE], shall be licensed according to
[COPYRIGHT]: https://github.com/dylni/os_str_bytes/blob/master/COPYRIGHT
[documentation]: https://docs.rs/os_str_bytes
[LICENSE-APACHE]: https://github.com/dylni/os_str_bytes/blob/master/LICENSE-APACHE
[slice]: https://doc.rust-lang.org/std/primitive.slice.html
[`OsStr`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html
[`OsString`]: https://doc.rust-lang.org/std/ffi/struct.OsString.html
[`Vec<u8>`]: https://doc.rust-lang.org/std/vec/struct.Vec.html
[`str`]: https://doc.rust-lang.org/std/primitive.str.html
[`String`]: https://doc.rust-lang.org/std/string/struct.String.html
105 changes: 54 additions & 51 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,47 +1,19 @@
//! This crate allows interacting with the data stored by [`OsStr`] and
//! This crate provides additional functionality for [`OsStr`] and
//! [`OsString`], without resorting to panics or corruption for invalid UTF-8.
//! Thus, methods can be used that are already defined on [`[u8]`][slice] and
//! [`Vec<u8>`].
//! Thus, familiar methods from [`str`] and [`String`] can be used.
//!
//! Typically, the only way to losslessly construct [`OsStr`] or [`OsString`]
//! from a byte sequence is to use `OsStr::new(str::from_utf8(bytes)?)`, which
//! requires the bytes to be valid in UTF-8. However, since this crate makes
//! conversions directly between the platform encoding and raw bytes, even some
//! strings invalid in UTF-8 can be converted.
//! # Usage
//!
//! In most cases, [`RawOsStr`] and [`RawOsString`] should be used.
//! [`OsStrBytes`] and [`OsStringBytes`] provide lower-level APIs that are
//! easier to misuse.
//! The most important trait included is [`OsStrBytesExt`], which provides
//! methods analagous to those of [`str`] but for [`OsStr`]. These methods will
//! never panic for invalid UTF-8 in a platform string, so they can be used to
//! manipulate [`OsStr`] values with the same simplicity possible for [`str`].
//!
//! # Encoding
//!
//! The encoding of bytes returned or accepted by methods of this crate is
//! intentionally left unspecified. It may vary for different platforms, so
//! defining it would run contrary to the goal of generic string handling.
//! However, the following invariants will always be upheld:
//!
//! - The encoding will be compatible with UTF-8. In particular, splitting an
//! encoded byte sequence by a UTF-8&ndash;encoded character always produces
//! other valid byte sequences. They can be re-encoded without error using
//! [`RawOsString::into_os_string`] and similar methods.
//!
//! - All characters valid in platform strings are representable. [`OsStr`] and
//! [`OsString`] can always be losslessly reconstructed from extracted bytes.
//!
//! Note that the chosen encoding may not match how Rust stores these strings
//! internally, which is undocumented. For instance, the result of calling
//! [`OsStr::len`] will not necessarily match the number of bytes this crate
//! uses to represent the same string.
//!
//! Additionally, concatenation may yield unexpected results without a UTF-8
//! separator. If two platform strings need to be concatenated, the only safe
//! way to do so is using [`OsString::push`]. This limitation also makes it
//! undesirable to use the bytes in interchange.
//!
//! Since this encoding can change between versions and platforms, it should
//! not be used for storage. The standard library provides implementations of
//! [`OsStrExt`] and [`OsStringExt`] for various platforms, which should be
//! preferred for that use case.
//! Additionally, the following wrappers are provided. They are primarily
//! legacy types from when this crate needed to perform more frequent encoding
//! conversions. However, they may be useful for their trait implementations.
//! - [`RawOsStr`] is a wrapper for [`OsStr`].
//! - [`RawOsString`] is a wrapper for [`OsString`].
//!
//! # User Input
//!
Expand All @@ -68,7 +40,7 @@
//!
//! - **memchr** -
//! Changes the implementation to use crate [memchr] for better performance.
//! This feature is useless when "raw\_os\_str" is disabled.
//! This feature is useless when the "raw\_os\_str" feature is disabled.
//!
//! For more information, see [`RawOsStr`][memchr complexity].
//!
Expand Down Expand Up @@ -108,6 +80,8 @@
//! - [`OsStrBytes`]
//! - [`OsStringBytes`]
//!
//! For more information, see [Encoding Conversions].
//!
//! - **print\_bytes** -
//! Provides implementations of [`print_bytes::ToBytes`] for [`RawOsStr`] and
//! [`RawOsString`].
Expand All @@ -127,12 +101,40 @@
//! crate. Otherwise, backward compatibility would be more difficult to
//! maintain for new features.
//!
//! # Complexity
//! # Encoding Conversions
//!
//! Methods provided by the "conversions" feature use an intentionally
//! unspecified encoding. It may vary for different platforms, so defining it
//! would run contrary to the goal of generic string handling. However, the
//! following invariants will always be upheld:
//!
//! - The encoding will be compatible with UTF-8. In particular, splitting an
//! encoded byte sequence by a UTF-8&ndash;encoded character always produces
//! other valid byte sequences. They can be re-encoded without error using
//! [`RawOsString::into_os_string`] and similar methods.
//!
//! - All characters valid in platform strings are representable. [`OsStr`] and
//! [`OsString`] can always be losslessly reconstructed from extracted bytes.
//!
//! Note that the chosen encoding may not match how [`OsStr`] stores these
//! strings internally, which is undocumented. For instance, the result of
//! calling [`OsStr::len`] will not necessarily match the number of bytes this
//! crate uses to represent the same string. However, unlike the encoding used
//! by [`OsStr`], the encoding used by this crate can be validated safely using
//! the following methods:
//! - [`OsStrBytes::assert_from_raw_bytes`]
//! - [`RawOsStr::assert_cow_from_raw_bytes`]
//! - [`RawOsString::assert_from_raw_vec`]
//!
//! Concatenation may yield unexpected results without a UTF-8 separator. If
//! two platform strings need to be concatenated, the only safe way to do so is
//! using [`OsString::push`]. This limitation also makes it undesirable to use
//! the bytes in interchange.
//!
//! Conversion method complexities will vary based on what functionality is
//! available for the platform. At worst, they will all be linear, but some can
//! take constant time. For example, [`RawOsString::into_os_string`] might be
//! able to reuse its allocation.
//! Since this encoding can change between versions and platforms, it should
//! not be used for storage. The standard library provides implementations of
//! [`OsStrExt`] and [`OsStringExt`] for various platforms, which should be
//! preferred for that use case.
//!
//! # Examples
//!
Expand Down Expand Up @@ -173,6 +175,7 @@
//! [bstr]: https://crates.io/crates/bstr
//! [`ByteSlice::to_os_str`]: https://docs.rs/bstr/0.2.12/bstr/trait.ByteSlice.html#method.to_os_str
//! [`ByteVec::into_os_string`]: https://docs.rs/bstr/0.2.12/bstr/trait.ByteVec.html#method.into_os_string
//! [Encoding Conversions]: #encoding-conversions
//! [memchr complexity]: RawOsStr#complexity
//! [memchr]: https://crates.io/crates/memchr
//! [`OsStrExt`]: ::std::os::unix::ffi::OsStrExt
Expand Down Expand Up @@ -313,7 +316,7 @@ if_checked_conversions! {
/// On Unix, this error is never returned, but [`OsStrExt`] or
/// [`OsStringExt`] should be used instead if that needs to be guaranteed.
///
/// [encoding]: self#encoding
/// [encoding]: self#encoding-conversions
/// [`OsStrExt`]: ::std::os::unix::ffi::OsStrExt
/// [`OsStringExt`]: ::std::os::unix::ffi::OsStringExt
/// [`Result::unwrap`]: ::std::result::Result::unwrap
Expand Down Expand Up @@ -393,7 +396,7 @@ if_conversions! {
/// # Ok::<_, io::Error>(())
/// ```
///
/// [unspecified encoding]: self#encoding
/// [unspecified encoding]: self#encoding-conversions
#[must_use = "method should not be used for validation"]
#[track_caller]
fn assert_from_raw_bytes<'a, S>(string: S) -> Cow<'a, Self>
Expand Down Expand Up @@ -453,7 +456,7 @@ if_conversions! {
/// assert_eq!(string.as_bytes(), &*os_string.to_raw_bytes());
/// ```
///
/// [unspecified encoding]: self#encoding
/// [unspecified encoding]: self#encoding-conversions
#[must_use]
fn to_raw_bytes(&self) -> Cow<'_, [u8]>;
}
Expand Down Expand Up @@ -985,7 +988,7 @@ if_conversions! {
/// # Ok::<_, io::Error>(())
/// ```
///
/// [unspecified encoding]: self#encoding
/// [unspecified encoding]: self#encoding-conversions
#[must_use = "method should not be used for validation"]
#[track_caller]
fn assert_from_raw_vec(string: Vec<u8>) -> Self;
Expand Down Expand Up @@ -1044,7 +1047,7 @@ if_conversions! {
/// assert_eq!(string.into_bytes(), os_string.into_raw_vec());
/// ```
///
/// [unspecified encoding]: self#encoding
/// [unspecified encoding]: self#encoding-conversions
#[must_use]
fn into_raw_vec(self) -> Vec<u8>;
}
Expand Down
62 changes: 14 additions & 48 deletions src/raw_str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,6 @@ unsafe impl TransmuteBox for [u8] {}
/// implementation and are generally not necessary. However, all indices
/// returned by this struct can be used for slicing.
///
/// On Unix, all indices are permitted, to avoid false positives. However,
/// relying on this implementation detail is discouraged. Platform-specific
/// indices are error-prone.
///
/// # Complexity
///
/// All searching methods have worst-case multiplicative time complexity (i.e.,
Expand All @@ -102,10 +98,6 @@ unsafe impl TransmuteBox for [u8] {}
/// representation is not stable. Transmuting between this type and any other
/// causes immediate undefined behavior.
///
/// # Nightly Notes
///
/// Indices are validated on all platforms.
///
/// [memchr complexity]: memchr::memmem::find#complexity
/// [unspecified encoding]: super#encoding
#[derive(Eq, Hash, Ord, PartialEq, PartialOrd)]
Expand Down Expand Up @@ -142,9 +134,6 @@ impl RawOsStr {

/// Wraps a string, without copying or encoding conversion.
///
/// This method is much more efficient than [`RawOsStr::new`], since the
/// [encoding] used by this crate is compatible with UTF-8.
///
/// # Examples
///
/// ```
Expand All @@ -154,8 +143,6 @@ impl RawOsStr {
/// let raw = RawOsStr::from_str(string);
/// assert_eq!(string, raw);
/// ```
///
/// [encoding]: super#encoding
#[allow(clippy::should_implement_trait)]
#[inline]
#[must_use]
Expand Down Expand Up @@ -183,7 +170,7 @@ impl RawOsStr {
/// # Ok::<_, io::Error>(())
/// ```
///
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[allow(clippy::missing_safety_doc)]
#[inline]
#[must_use]
Expand Down Expand Up @@ -225,7 +212,7 @@ impl RawOsStr {
/// # Ok::<_, io::Error>(())
/// ```
///
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))]
#[inline]
#[must_use = "method should not be used for validation"]
Expand Down Expand Up @@ -292,7 +279,7 @@ impl RawOsStr {
/// ```
///
/// [`from_encoded_bytes_unchecked`]: Self::from_encoded_bytes_unchecked
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[inline]
#[must_use]
pub fn as_encoded_bytes(&self) -> &[u8] {
Expand Down Expand Up @@ -738,7 +725,7 @@ impl RawOsStr {
/// assert_eq!(string.as_bytes(), &*raw.to_raw_bytes());
/// ```
///
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))]
#[inline]
#[must_use]
Expand Down Expand Up @@ -963,11 +950,8 @@ impl ToOwned for RawOsStr {
/// [`Cow<RawOsStr>`]: Cow
#[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "raw_os_str")))]
pub trait RawOsStrCow<'a>: private::Sealed {
/// Converts a platform-native string back to this representation.
///
/// # Nightly Notes
///
/// This method does not require copying or encoding conversion.
/// Converts a platform-native string back to this representation, without
/// copying or encoding conversion.
///
/// # Examples
///
Expand All @@ -987,11 +971,8 @@ pub trait RawOsStrCow<'a>: private::Sealed {
#[must_use]
fn from_os_str(string: Cow<'a, OsStr>) -> Self;

/// Converts this representation back to a platform-native string.
///
/// # Nightly Notes
///
/// This method does not require copying or encoding conversion.
/// Converts this representation back to a platform-native string, without
/// copying or encoding conversion.
///
/// # Examples
///
Expand Down Expand Up @@ -1041,14 +1022,7 @@ impl<'a> RawOsStrCow<'a> for Cow<'a, RawOsStr> {
pub struct RawOsString(Vec<u8>);

impl RawOsString {
/// Converts a platform-native string into a representation that can be
/// more easily manipulated.
///
/// For more information, see [`RawOsStr::new`].
///
/// # Nightly Notes
///
/// This method does not require copying or encoding conversion.
/// Wraps a platform-native string, without copying or encoding conversion.
///
/// # Examples
///
Expand All @@ -1071,9 +1045,6 @@ impl RawOsString {

/// Wraps a string, without copying or encoding conversion.
///
/// This method is much more efficient than [`RawOsString::new`], since the
/// [encoding] used by this crate is compatible with UTF-8.
///
/// # Examples
///
/// ```
Expand All @@ -1083,8 +1054,6 @@ impl RawOsString {
/// let raw = RawOsString::from_string(string.clone());
/// assert_eq!(string, raw);
/// ```
///
/// [encoding]: super#encoding
#[inline]
#[must_use]
pub fn from_string(string: String) -> Self {
Expand Down Expand Up @@ -1147,7 +1116,7 @@ impl RawOsString {
/// # Ok::<_, io::Error>(())
/// ```
///
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))]
#[inline]
#[must_use = "method should not be used for validation"]
Expand Down Expand Up @@ -1250,18 +1219,15 @@ impl RawOsString {
/// ```
///
/// [`from_encoded_vec_unchecked`]: Self::from_encoded_vec_unchecked
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[inline]
#[must_use]
pub fn into_encoded_vec(self) -> Vec<u8> {
self.0
}

/// Converts this representation back to a platform-native string.
///
/// # Nightly Notes
///
/// This method does not require copying or encoding conversion.
/// Converts this representation back to a platform-native string, without
/// copying or encoding conversion.
///
/// # Examples
///
Expand Down Expand Up @@ -1300,7 +1266,7 @@ impl RawOsString {
/// assert_eq!(string.into_bytes(), raw.into_raw_vec());
/// ```
///
/// [unspecified encoding]: super#encoding
/// [unspecified encoding]: super#encoding-conversions
#[cfg_attr(os_str_bytes_docs_rs, doc(cfg(feature = "conversions")))]
#[inline]
#[must_use]
Expand Down

0 comments on commit 81ac84b

Please sign in to comment.