Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive macros are unsound #281

Open
mahkoh opened this issue Oct 20, 2024 · 7 comments
Open

Derive macros are unsound #281

mahkoh opened this issue Oct 20, 2024 · 7 comments
Labels
The Language Could Change I Guess A change in the language could always potentially invalidate old code

Comments

@mahkoh
Copy link

mahkoh commented Oct 20, 2024

I came across google/zerocopy#388 (comment) by chance. The same applies to this crate:

type S = &'static str;

#[derive(Copy, Clone, Debug, AnyBitPattern)]
#[insert_field(x = "S")]
struct X {
}

fn main() {
    let x: &X = bytemuck::cast_ref(&[0usize, 1usize]);
    println!("{}", x.x);
}

Segfaults. Solutions might involve generating a function that asserts (by compiling) that the types seen by the proc macro are the same as the types in the final type and that the final type contains no additional fields.

@Lokathor
Copy link
Owner

I'm not remembering what Derive proc macros are allowed to emit. Can they create code other code besides the trait impl they're for?

Otherwise, we might have to do the other option mentioned in the zerocopy thread and bail out on all unknown attributes. BUT that would be extremely uncool.

@zachs18
Copy link
Contributor

zachs18 commented Oct 20, 2024

Yes, derive macros can emit arbitrary items, essentially. (IIRC this is already used in bytemuck_derive, where some derive macros emit some const _: () = { some static assertion stuff }; in addition to the trait impl.)

That said, Is it documented anywhere that it is intentional for derive macros to see the "original" tokens when an item has been modified by an attribute macro? My first instinct would be to call it a Rust bug that the derive macro does not get the correct TokenStream that defines the item it is intended to be applied to.

@Lokathor
Copy link
Owner

i believe the exact ordering of the attributes matters, unfortunately

@zachs18
Copy link
Contributor

zachs18 commented Oct 21, 2024

use the_macros::*;

fn main() {
    {
        #[derive(PrintTokens)]
        #[add_fields({y: u32})]
        struct Foo {
            x: u32,
        }
        print_tokens()
    }

    {
        #[add_fields({y: u32})]
        #[derive(PrintTokens)]
        struct Foo {
            x: u32,
        }
        print_tokens()
    }
}
#[add_fields({ y: u32 })] struct Foo { x: u32, }
struct Foo { x : u32, y : u32 }

Okay, the original bug report made it sound like Rust never applied attribute macros before giving the item to derive macros, but yeah it does if it is before the derive (since it applies macros outside-in) and otherwise the attribute macro is still in the input. So erroring out if there are any unknown attributes in the input (the first case here) would be one way to make this sound, perhaps with an error message telling the user to move the attribute macro above the derive. (see reply)

the_macros source
# Cargo.toml
[package]
name = "the-macros"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "1.0.88"
quote = "1.0.37"
syn = { version = "2.0.82", features = ["full"] }
// lib.rs
use proc_macro::{TokenStream};
use proc_macro2::{Literal, Ident, Group, Punct, Span, Spacing, Delimiter};

#[proc_macro_derive(PrintTokens)]
pub fn derive_print_tokens(item: TokenStream) -> TokenStream {
    let literal = Literal::string(&item.to_string());
    quote::quote!{
        fn print_tokens() {
            println!("{}", #literal);
        }
    }.into()
}

#[proc_macro_attribute]
pub fn add_fields(attr: TokenStream, item: TokenStream) -> TokenStream {
    let syn::Item::Struct(mut item) = syn::parse::<syn::Item>(item).unwrap() else { panic!() };
    let new_fields = syn::parse::<syn::FieldsNamed>(attr).unwrap();
    let syn::Fields::Named(ref mut fields) = item.fields else { panic!() };
    fields.named.extend(new_fields.named);
    quote::quote! { #item }.into()
}

@zachs18
Copy link
Contributor

zachs18 commented Oct 24, 2024

So erroring out if there are any unknown attributes in the input (the first case here) would be one way to make this sound, perhaps with an error message telling the user to move the attribute macro above the derive.

Actually after some testing, this won't work in general. Even ignoring all the false positives it would have, there are also false negatives: there's no way in general to differentiate between our own helper attributes for other traits (like #[zeroable(bound = "")]) and actual unknown attribute proc-macros.

false negative example
use adversarial_macros::transparent; // inserts a `y: Infallible` field, e.g., or any 1-ZST with a nontrivial safety invariant
use bytemuck::{Zeroable};

#[derive(Clone, Copy, Zeroable, TransparentWrapper)]
#[transparent(u32)] // fine, made inert by `derive(TransparentWrapper)`
#[repr(transparent)]
struct Foo {
  x: u32
}

#[derive(Clone, Copy, Zeroable)]
#[transparent(u32)] // unsound false negative, adds a field after `Zeroable` derive macro runs
#[repr(transparent)]
struct Foo {
  x: u32
}

From inside the Zeroable derive-macro, we cannot know if derive(TransparentWrapper) is also being applied, so we don't know if #[transparent(u32)] is a helper attribute for derive(TransparentWrapper) or a call to the procmacro adversarial_macros::transparent.

Edit: I suppose we could get around this by only allowing the helper attributes for the specific trait being derived, which would require writing the derives separately with the helpers between them, actually no that won't work either, because the helper attributes are "inert", but are not actually stripped from the input for later derives, e.g.

example
use adversarial_macros::transparent;

#[derive(Debug, Clone, Copy)]
#[derive(TransparentWrapper)] // comment this out for a different error
#[transparent(u32)]
#[derive(Zeroable)]
#[repr(transparent)]
struct Foo {
    x: u32,
}

derive(Zeroable) still sees transparent(u32) in the input if it was used as an inert helper attribute in derive(TransparentWrapper) instead of being an "active" procmacro.

@zachs18
Copy link
Contributor

zachs18 commented Oct 24, 2024

Another wrinkle: For fields, we can check the types using some const _: () = { /* do some stuff */ };checks to make sure nothing changed the field types, but for enum discriminants, I don't know if there is any way to be sure that a procmacro did not change any variant discriminants, e.g.

#[derive(Zeroable)]
#[repr(u32)]
#[adversarial_macro] // changes discriminant so none are 0
enum Foo {
  X(i32) = 0,
}

Is there even any way to detect this at compile time?

Edit: For a fieldless enum you can const _: () = assert!(Foo::X as int_ty == 0);, but for fieldful enums I don't know if there is a way to do it at compile time.

@workingjubilee
Copy link

That said, Is it documented anywhere that it is intentional for derive macros to see the "original" tokens when an item has been modified by an attribute macro? My first instinct would be to call it a Rust bug that the derive macro does not get the correct TokenStream that defines the item it is intended to be applied to.

It's sort of the inverse: I don't believe we have fixed the order of macro evaluation for proc macros in a way that is particularly useful or provides any real guarantees to macro authors, in any direction.

@Lokathor Lokathor added the The Language Could Change I Guess A change in the language could always potentially invalidate old code label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
The Language Could Change I Guess A change in the language could always potentially invalidate old code
Projects
None yet
Development

No branches or pull requests

4 participants