Derive macros are unsound #281

mahkoh · 2024-10-20T19:11:10Z

I came across google/zerocopy#388 (comment) by chance. The same applies to this crate:

type S = &'static str;

#[derive(Copy, Clone, Debug, AnyBitPattern)]
#[insert_field(x = "S")]
struct X {
}

fn main() {
    let x: &X = bytemuck::cast_ref(&[0usize, 1usize]);
    println!("{}", x.x);
}

Segfaults. Solutions might involve generating a function that asserts (by compiling) that the types seen by the proc macro are the same as the types in the final type and that the final type contains no additional fields.

Lokathor · 2024-10-20T20:45:09Z

I'm not remembering what Derive proc macros are allowed to emit. Can they create code other code besides the trait impl they're for?

Otherwise, we might have to do the other option mentioned in the zerocopy thread and bail out on all unknown attributes. BUT that would be extremely uncool.

zachs18 · 2024-10-20T21:18:36Z

Yes, derive macros can emit arbitrary items, essentially. (IIRC this is already used in bytemuck_derive, where some derive macros emit some const _: () = { some static assertion stuff }; in addition to the trait impl.)

That said, Is it documented anywhere that it is intentional for derive macros to see the "original" tokens when an item has been modified by an attribute macro? My first instinct would be to call it a Rust bug that the derive macro does not get the correct TokenStream that defines the item it is intended to be applied to.

Lokathor · 2024-10-20T22:29:50Z

i believe the exact ordering of the attributes matters, unfortunately

zachs18 · 2024-10-21T02:26:19Z

use the_macros::*;

fn main() {
    {
        #[derive(PrintTokens)]
        #[add_fields({y: u32})]
        struct Foo {
            x: u32,
        }
        print_tokens()
    }

    {
        #[add_fields({y: u32})]
        #[derive(PrintTokens)]
        struct Foo {
            x: u32,
        }
        print_tokens()
    }
}

#[add_fields({ y: u32 })] struct Foo { x: u32, }
struct Foo { x : u32, y : u32 }

Okay, the original bug report made it sound like Rust never applied attribute macros before giving the item to derive macros, but yeah it does if it is before the derive (since it applies macros outside-in) and otherwise the attribute macro is still in the input. So erroring out if there are any unknown attributes in the input (the first case here) would be one way to make this sound, perhaps with an error message telling the user to move the attribute macro above the derive. (see reply)

the_macros source

# Cargo.toml
[package]
name = "the-macros"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "1.0.88"
quote = "1.0.37"
syn = { version = "2.0.82", features = ["full"] }

// lib.rs
use proc_macro::{TokenStream};
use proc_macro2::{Literal, Ident, Group, Punct, Span, Spacing, Delimiter};

#[proc_macro_derive(PrintTokens)]
pub fn derive_print_tokens(item: TokenStream) -> TokenStream {
    let literal = Literal::string(&item.to_string());
    quote::quote!{
        fn print_tokens() {
            println!("{}", #literal);
        }
    }.into()
}

#[proc_macro_attribute]
pub fn add_fields(attr: TokenStream, item: TokenStream) -> TokenStream {
    let syn::Item::Struct(mut item) = syn::parse::<syn::Item>(item).unwrap() else { panic!() };
    let new_fields = syn::parse::<syn::FieldsNamed>(attr).unwrap();
    let syn::Fields::Named(ref mut fields) = item.fields else { panic!() };
    fields.named.extend(new_fields.named);
    quote::quote! { #item }.into()
}

zachs18 · 2024-10-24T01:11:46Z

So erroring out if there are any unknown attributes in the input (the first case here) would be one way to make this sound, perhaps with an error message telling the user to move the attribute macro above the derive.

Actually after some testing, this won't work in general. Even ignoring all the false positives it would have, there are also false negatives: there's no way in general to differentiate between our own helper attributes for other traits (like #[zeroable(bound = "")]) and actual unknown attribute proc-macros.

false negative example

use adversarial_macros::transparent; // inserts a `y: Infallible` field, e.g., or any 1-ZST with a nontrivial safety invariant
use bytemuck::{Zeroable};

#[derive(Clone, Copy, Zeroable, TransparentWrapper)]
#[transparent(u32)] // fine, made inert by `derive(TransparentWrapper)`
#[repr(transparent)]
struct Foo {
  x: u32
}

#[derive(Clone, Copy, Zeroable)]
#[transparent(u32)] // unsound false negative, adds a field after `Zeroable` derive macro runs
#[repr(transparent)]
struct Foo {
  x: u32
}

From inside the Zeroable derive-macro, we cannot know if derive(TransparentWrapper) is also being applied, so we don't know if #[transparent(u32)] is a helper attribute for derive(TransparentWrapper) or a call to the procmacro adversarial_macros::transparent.

Edit: ~~I suppose we could get around this by only allowing the helper attributes for the specific trait being derived, which would require writing the derives separately with the helpers between them,~~ actually no that won't work either, because the helper attributes are "inert", but are not actually stripped from the input for later derives, e.g.

example

use adversarial_macros::transparent;

#[derive(Debug, Clone, Copy)]
#[derive(TransparentWrapper)] // comment this out for a different error
#[transparent(u32)]
#[derive(Zeroable)]
#[repr(transparent)]
struct Foo {
    x: u32,
}

derive(Zeroable) still sees transparent(u32) in the input if it was used as an inert helper attribute in derive(TransparentWrapper) instead of being an "active" procmacro.

zachs18 · 2024-10-24T01:18:25Z

Another wrinkle: For fields, we can check the types using some const _: () = { /* do some stuff */ };checks to make sure nothing changed the field types, but for enum discriminants, I don't know if there is any way to be sure that a procmacro did not change any variant discriminants, e.g.

#[derive(Zeroable)]
#[repr(u32)]
#[adversarial_macro] // changes discriminant so none are 0
enum Foo {
  X(i32) = 0,
}

Is there even any way to detect this at compile time?

Edit: For a fieldless enum you can const _: () = assert!(Foo::X as int_ty == 0);, but for fieldful enums I don't know if there is a way to do it at compile time.

workingjubilee · 2024-10-24T01:52:15Z

That said, Is it documented anywhere that it is intentional for derive macros to see the "original" tokens when an item has been modified by an attribute macro? My first instinct would be to call it a Rust bug that the derive macro does not get the correct TokenStream that defines the item it is intended to be applied to.

It's sort of the inverse: I don't believe we have fixed the order of macro evaluation for proc macros in a way that is particularly useful or provides any real guarantees to macro authors, in any direction.

Lokathor added the The Language Could Change I Guess A change in the language could always potentially invalidate old code label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derive macros are unsound #281

Derive macros are unsound #281

mahkoh commented Oct 20, 2024

Lokathor commented Oct 20, 2024

zachs18 commented Oct 20, 2024

Lokathor commented Oct 20, 2024

zachs18 commented Oct 21, 2024 •

edited

Loading

zachs18 commented Oct 24, 2024 •

edited

Loading

zachs18 commented Oct 24, 2024 •

edited

Loading

workingjubilee commented Oct 24, 2024

Derive macros are unsound #281

Derive macros are unsound #281

Comments

mahkoh commented Oct 20, 2024

Lokathor commented Oct 20, 2024

zachs18 commented Oct 20, 2024

Lokathor commented Oct 20, 2024

zachs18 commented Oct 21, 2024 • edited Loading

zachs18 commented Oct 24, 2024 • edited Loading

zachs18 commented Oct 24, 2024 • edited Loading

workingjubilee commented Oct 24, 2024

zachs18 commented Oct 21, 2024 •

edited

Loading

zachs18 commented Oct 24, 2024 •

edited

Loading

zachs18 commented Oct 24, 2024 •

edited

Loading