Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transmute_ptr_to_ptr doesn’t justify its existence #6372

Open
wchargin opened this issue Nov 24, 2020 · 7 comments
Open

transmute_ptr_to_ptr doesn’t justify its existence #6372

wchargin opened this issue Nov 24, 2020 · 7 comments
Labels
A-documentation Area: Adding or improving documentation C-enhancement Category: Enhancement of lints, like adding more cases or adding help messages

Comments

@wchargin
Copy link
Contributor

wchargin commented Nov 24, 2020

The transmute_ptr_to_ptr lint fires on transmute<T, U> when
both T and U are reference types. The docs say:

Why this is bad

Transmutes are dangerous, and these can instead be written as casts.

But this doesn’t explain anything. These casts are just as unsafe as the
transmute. In fact, transmute can be written in terms of such
casts:

pub unsafe fn transmute<T, U>(t: T) -> U {
    std::ptr::read(&t as *const T as *const U)
}

…so clearly they don’t bring any safety benefit.

I am aware that the docs for transmute itself also mention that you
can use as-casts in place of transmute
. But those docs also
offer no explanation. They were added in rust-lang/rust#34609, where
@sfackler voiced the same concern but received no response.
(@sfackler also points out that transmute::<&T, &U> at least checks
statically that &T and &U have the same size, which pointer casts do
not. See size-checking example.)

Some veteran programmers on my team have suggested using transmute in
these cases for clarity, and I have to agree that it’s clearer:

unsafe { std::mem::transmute::<&str, &Tag>(s.as_ref()) }
// versus
unsafe { &*(s.as_ref() as *const str as *const Tag) }

The former says what it’s doing right on the tin. In the latter, you
have to make sure that your &s and *s are all matching, and it’s
less self-documenting.

Does this lint exist for a reason other than safety theater or cargo
culting? If so, could you please add an explanation to the docs?

@wchargin
Copy link
Contributor Author

cc @taralx

@llogiq
Copy link
Contributor

llogiq commented Nov 28, 2020

I really hope that we get some safe transmutes soon-ish.

@flip1995 flip1995 added A-documentation Area: Adding or improving documentation C-enhancement Category: Enhancement of lints, like adding more cases or adding help messages labels Nov 28, 2020
@taralx
Copy link
Contributor

taralx commented Dec 1, 2020

Maybe, but in the mean time, what's the right option here? Maybe this should be disabled by default?

@ratijas
Copy link

ratijas commented Apr 15, 2021

Half year later, there are still no explanations. So, clearly, no one really knows why it even exists. I believe it was supposed to be the other way around, but someone somewhere just messed up the cause and effect kind of stuff.

@taralx
Copy link
Contributor

taralx commented Apr 16, 2021

Well, let's try a PR and see what happens.

bors added a commit that referenced this issue Apr 22, 2021
Switch transmute_ptr_to_ptr to "pedantic" class.

Per discussion in #6372, this lint has significant false positives.

changelog: transmute_ptr_to_ptr defaults to "allow".
@KisaragiEffective
Copy link
Contributor

KisaragiEffective commented Feb 3, 2023

Old issue, but I'd like to propose the following lint documentation (maybe less of this):

Warning

std::mem::transmute is very unsafe: you can create any value (or bit-pattern), including an invalid bit-pattern. Creating an invalid bit-pattern is clearly UB. But the compiler cannot verify whether it is a valid bit-pattern after transmuted, so you are responsible for your usage. Avoid using std::mem::transmute if you do not truly need it. You should consider that the same effect might be achievable by safer alternatives:

Slice-to-slice transmutation is extremely unlikely to be what you want

TL;DR: There are pitfalls

Note in particular that the internal length remains the same (see fig1). This may not what you expect:

  • may cause UB due to incorrect alignment. see fig3.
  • bit-pattern may get truncated. see fig2.

You might want to consider using the following alternatives:

  • align_to - keeps the bit-pattern, and re-spans according to the size of the destination type. But since it keeps the bit patterns: &[u8]::align_to::<NonZeroU8>() may still cause UB, because an all-zero bit pattern for a NonZeroU8 is invalid and is defined as an UB.
  • bytemuck crate: contains useful and general operations for reinterpreting bytes

fig1. general memory layout between slices:

 &[T] |  *T  | length: usize |
   |  |      |               |
   |  | core::mem::transmute |
   v  |      |               |
 &[U] |  *U  | length: usize |
         ||          /\
address  ||          || and the length 
remains  ||          || is also unchanged
the same ||          || 
         \/
         | T[0] | T[1] | ...
         |
         | U[0] ...

fig2. Intended to have the same effect as it.iter().map(|x| x.1).collect::<Vec<_>>().as_slice(), even though (u16, u32) does not have paddings (this is a wrong assumption on most platforms. e.g. on x86_64, it takes 8 bytes). slice have 4 elements:

  &[(u16, u32)]  | *(u16, u32) | 4: usize |
       |         |             |          |
       | core::mem::transmute  |          |
       v         |             |          |
    &[u32]       |     *u32    | 4: usize | 
                        |
                        | remains the same address
/-----------------------/
|
v
+-----------------+-----------------+-----------------+-----------------+-----------------+
|     src[0]      |     src[1]      |     src[2]      |     src[3]      |     src[4]      |
+-----+-----------+-----+-----------+-----+-----------+-----+-----------+-----+-----------+
| [0] |    [1]    | [0] |    [1]    | [0] |    [1]    | [0] |    [1]    | [0] |    [1]    |
| u16 |    u32    | u16 |    u32    | u16 |    u32    | u16 |    u32    | u16 |    u32    |
+-----+-----------+-----+-----------+-----+-----------+-----+-----------+-----+-----------+
|    u32    |    u32    |    u32    |    u32    |    u32    | CANT READ . CAN NOT BE READ .
+-----------+-----------+-----------+-----------+-----------+...........*.................*
|  dst[0]   |  dst[1]   |  dst[2]   |  dst[3]   |  dst[4]   | CANT READ . CAN NOT BE READ .
+-----------+-----------+-----------+-----------+-----------+...........*.................*

fig3. The reverse case of fig2: write to dst[3] or dst[4] may cause UB, and even creating this reference is UB and can also cause segfault (because you don't own after src[4]) for most modern OS:

     &[u32]      |     *u32    | 4: usize |
        |        |      |      |          |
        | core::mem::transmute |          |
        v        |      v      |          |
  &[(u32, u16)]  | *(u32, u16) | 4: usize | 
                        |
                        | remains the same address
/-----------------------/
|
v
+-----------+-----------+-----------+-----------+-----------+...........*.................*
|  src[0]   |  src[1]   |  src[2]   |  src[3]   |  src[4]   | DON'T OWN . YOU DO  NOT OWN .
+-----------+-----------+-----------+-----------+-----------+...........*.................*
|    u32    |    u32    |    u32    |    u32    |    u32    | DON'T OWN . YOU DO  NOT OWN .
+-----+-----------+-----+-----------+-----+-----------+-----+-----------+-----+-----------+
| [0] |    [1]    | [0] |    [1]    | [0] |    [1]    | [0] |    [1]    | [0] |    [1]    |
| u16 |    u32    | u16 |    u32    | u16 |    u32    | u16 |    u32    | u16 |    u32    |
+-----+-----------+-----+-----------+-----+-----------+-----+-----------+-----+-----------+
|     dst[0]      |     dst[1]      |     dst[2]      |     dst[3]      |     dst[4]      |
+-----------------+-----------------+-----------------+-----------------+-----------------+

Case study

Consider transmuting a &[u8] to a &str2. These types have the same size (both are fat-pointers) and has the same layout (internally, a pointer and length of usize), but &str has an additional requirement to &[u8]:

String slices are always a valid UTF-8.

So you can always convert a &str to a &[u8] without UB (see implementation of core::str::as_bytes), but the reverse is not always true.
Example: b"\xFF" is not valid UTF-8, so transmuting is leading to UB.

Genelify

In general, transmuting T to U may lead to UB in the following cases (so you shouldn't use if you're not sure what you're doing):

  • U holds T and requires additional constraints about T: there may *_unchecked method that is safer (may use transmute internally, but the implementor can add additional assertions)
    • &[u8] to &str (safe alternative: core::str::from_utf8, unsafe alternative: core::str::from_utf8_unchecked) - this is covered by clippy::transmute_bytes_to_str
    • numeric types to NonZero-types
      • safe alternative: TryFrom traits or new function
      • unsafe alternative: new_unchecked function
  • U is larger than T and not the same size as T
  • U requires stricter constraints about the bit-pattern than T
    • unsigned integer to bool where its value is larger than 1
  • &T to &mut U is UB. Definitely.
    • Note: the direct transmute is covered compiler itself: mutable_transmutes does.
    • &T -> *const T -> *mut T -> &mut T is not (and is hard to detect by linting-approach), however miri detects UB.
  • *T to *mut U then derefence to write, if the pointer was actually not-writable.
  • ... above case is true for references:
    • &T to &U
    • &mut T to &U
    • *const T to *const U
    • *mut T to *mut U

Personally, I don't believe it is pedantic entirely; I believe some cases fall under suspicious and/or correctness.

(thanks @sozysozbot for correcting my paragraphs)

Footnotes

  1. You are responsible in both way in those case.

  2. https://doc.rust-lang.org/core/primitive.str.html#representation

@smoelius
Copy link
Contributor

Adding to the problems listed above, transmute_ptr_to_ptr's suggestions tigger ptr_as_ptr (playground).

bors added a commit that referenced this issue Jul 23, 2024
Suggest `.cast`/`.cast_const`/`.cast_mut` in `transmute_ptr_as_ptr`

Essentially pre-applies `ptr_as_ptr` - #6372 (comment)

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-documentation Area: Adding or improving documentation C-enhancement Category: Enhancement of lints, like adding more cases or adding help messages
Projects
None yet
Development

No branches or pull requests

7 participants