Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build-time assertion #257

Merged
merged 3 commits into from
May 19, 2021
Merged

Build-time assertion #257

merged 3 commits into from
May 19, 2021

Conversation

nbdd0121
Copy link
Member

@nbdd0121 nbdd0121 commented May 9, 2021

Add link_time_panic and link_time_assert!. link_time_panic is defined as a const but #[inline(never)] function, so it panics when used in const context, but will not be inlined. This crate is not used when linking, so if optimiser cannot optimize away a callsite a link error is generated.

@bjorn3
Copy link
Member

bjorn3 commented May 9, 2021

This crate is not used when linking, so if optimiser cannot optimize away a callsite a link error is generated.

Usage of this macro will very likely break compiling without optimizations. In addition certain (future) versions of rustc/LLVM may not be able to optimize the call sites away.

Comment on lines 22 to 24
// Could also be panic!(msg) to avoid using unstable feature `core_panic`,
// but it is not allowed in Rust 2021, while panic!("{}", msg) could not
// yet be used in const context.
Copy link
Member

@bjorn3 bjorn3 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could downgrade this crate to the 2015 edition and silence the warning for panic!(msg). Past editions are forever supported by rustc. Rust-analyzer barely has support for it, but this crate is simple enough that I don't think rust-analyzer has any problems with it. Maybe it resolves panic!() to the 2021 edition version and gives an error, but it doesn't show errors for closed files anyway, so it isn't very distracting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently all crates have editions specified by a global rustc_flags, and --edition flag couldn't be specified more than once.

rust/Makefile Outdated
@@ -147,9 +148,17 @@ $(objtree)/rust/alloc.o: $$(RUST_LIB_SRC)/alloc/src/lib.rs \
$(objtree)/rust/compiler_builtins.o FORCE
$(call if_changed_dep,rustc_library)

# link_time_panic.o is deliberately only compiled as a dependency but not
# included in obj-y.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like the .o is linked into kernel.o, is the idea that it's linked there, but not into the final .ko?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel.o has link_time_panic.o as a Make dependency, but it will not link against it.

@ojeda
Copy link
Member

ojeda commented May 9, 2021

Isn't this static_assert!?

static_assert! cannot be used in your relax_bound in the other PR, though, because of the painful error[E0401]: can't use generic parameters from outer function. However, I would simply make it a panic! since we already have const_panic.

@nbdd0121
Copy link
Member Author

nbdd0121 commented May 9, 2021

Isn't this static_assert!?

static_assert! cannot be used in your relax_bound in the other PR, though, because of the painful error[E0401]: can't use generic parameters from outer function. However, I would simply make it a panic! since we already have const_panic.

static_assert!'s argument is in const context, i.e. it cannot refer to generic parameters. Using const panic! however cannot guarantee that it will not leak through into the final binary. More about const context vs const fn here: https://doc.rust-lang.org/reference/const_eval.html. link_time_panic can be used anywhere panic! can be used, const context, const fn and even non-const fns, so it is more versatile. When used in const contexts, it behaves just like a panic!.

The reason to have link_time_panic arises from Linus Torvalds's opinion that anything that could be statically checked in compile time should not be checked in the runtime, so that a bug is catched as early as possible. The idea of having a link error to replace a runtime error came from @wedsonaf's email to Linus Walleij in the mailing list, https://lore.kernel.org/rust-for-linux/YIbQ3dHOpyD%[email protected]/. But instead of a non-existing extern symbol I put it in a never-linked crate, and make it a const fn, so it could be used in const contexts and const fns.

Usage of this macro will very likely break compiling without optimizations.

I am not worried about turning off optimisations. Using non-existing symbol and rely on dead code elimination to remove it is a common hack widely used in the kernel's C code already AFAIK (trying to compile the kernel with O0 will give you link errors), so I think it's okay to do the same thing in the Rust side as well. Plus, you'll never want to turn off optimisation for kernel or any large project in the first place, as it will be painfully slow and large.

In addition certain (future) versions of rustc/LLVM may not be able to optimize the call sites away.

I'll consider it as a P-critical I-slow regression for rustc/LLVM if it cannot fold away constants, if it ever happens, so I'm not particularly worried about it. Even if it happens, we can simply link link_time_panic.o temporarily while the regression is being fixed. Even if link_time_panic.o is indeed being linked it gives a way to tell apart which panic is really not supposed to happen when analysing the binary.

@bjorn3
Copy link
Member

bjorn3 commented May 9, 2021

I am not worried about turning off optimisations. Using non-existing symbol and rely on dead code elimination to remove it is a common hack widely used in the kernel's C code already AFAIK (trying to compile the kernel with O0 will give you link errors), so I think it's okay to do the same thing in the Rust side as well. Plus, you'll never want to turn off optimisation for kernel or any large project in the first place, as it will be painfully slow and large.

You may want to for better debuginfo. It can be hard to wade through heaps of inline functions. These are much more common in Rust than C due to favoring zero-cost abstractions over no abstractions at all. In addition you don't necessarily have to disable optimizations for everything. Disabling it just for the component you are debugging makes sense.

I'll consider it as a P-critical I-slow regression for rustc/LLVM if it cannot fold away constants, if it ever happens, so I'm not particularly worried about it.

The conditional that should be folded away may not be trivial in which case it is pretty much luck if it is being folded away at all. Any small change to a completely unrelated optimization pass may prevent folding away the conditional. It may even be caused by another optimization getting smarter and optimizing the program better overall. Or maybe a function in the standard library or elsewhere stops being marked as #[inline] because inlining it has been shown to reduce performance. In that case LTO would be required for there being any possibility of folding the conditional away. It may also be that enabling two rust kernel modules at the same time when using LTO would cause LLVM to consider inlining a particular function that would be favorable to inline when only one of those kernel modules is compiled to no longer be favorable when both kernel modules are compiled, as said function for example now gets called twice instead of once.

Even if it happens, we can simply link link_time_panic.o temporarily while the regression is being fixed.

I guess so.

Using non-existing symbol and rely on dead code elimination to remove it is a common hack widely used in the kernel's C code already AFAIK (trying to compile the kernel with O0 will give you link errors), so I think it's okay to do the same thing in the Rust side as well.

If it is already commonly used by C in the kernel I guess it is fine to do the same for rust. 🤷

@ojeda
Copy link
Member

ojeda commented May 9, 2021

static_assert!'s argument is in const context, i.e. it cannot refer to generic parameters.

Currently, yes. But conceptually, this is intended for the same purpose as static_assert!. I do not care how it is implemented, but I do not understand why you want to have two different implementations.

The new name is also wrong, because it leaks implementation details.

Using const panic! however cannot guarantee that it will not leak through into the final binary.

const_panic or otherwise, it does not change how this works. The approach of this PR uses a simple conditional that is expected to be optimized out, the same as I suggested.

The difference is that a simple panic!() would still work for non-optimized builds, while yours would fail to build. This is important, because we "support" -O0 and -O1 so far (not for production, just under the hacking menu, but nevertheless, it is there).

We can discuss removing support for those builds, that is fine and it would not be the first time, but it is not the same discussion.

More about const context vs const fn here: https://doc.rust-lang.org/reference/const_eval.html. link_time_panic can be used anywhere panic! can be used, const context, const fn and even non-const fns, so it is more versatile. When used in const contexts, it behaves just like a panic!.

Yes, that is understood -- it just uses a simple conditional. But it is not what I was talking about.

The reason to have link_time_panic arises from Linus Torvalds's opinion that anything that could be statically checked in compile time should not be checked in the runtime, so that a bug is catched as early as possible.

Not sure who you are trying to convince here -- I added static_assert! to the project...

In fact, it is not really an "opinion", it is almost an engineering fact at this point. And it is not news either, in C++ testing conditions like this for similar cases (like templates with NTTPs) is everyday business.

The idea of having a link error to replace a runtime error came from @wedsonaf's email to Linus Walleij in the mailing list, https://lore.kernel.org/rust-for-linux/YIbQ3dHOpyD%[email protected]/. But instead of a non-existing extern symbol I put it in a never-linked crate, and make it a const fn, so it could be used in const contexts and const fns.

I am not really sure what your point is with this paragraph either, but the link error suggestion was mentioned by Torvalds many days before that: https://lore.kernel.org/rust-for-linux/CAHk-=wgJvJJtd2mpYpx5+zn_hPrSOqGqi-Pxb7e+h+anhsLnQg@mail.gmail.com/ (and it is not a new idea).

@ojeda
Copy link
Member

ojeda commented May 9, 2021

I am not worried about turning off optimisations. Using non-existing symbol and rely on dead code elimination to remove it is a common hack widely used in the kernel's C code already AFAIK (trying to compile the kernel with O0 will give you link errors), so I think it's okay to do the same thing in the Rust side as well.

Yes, but that change needs to be discussed, because as I mentioned above, this is currently "supported".

Plus, you'll never want to turn off optimisation for kernel or any large project in the first place, as it will be painfully slow and large.

Not true: under QEMU, a pass over -O0 samples take a couple tenths of a second. From boot to reboot, it takes less than 2 seconds in my machine. Remember that we do not need to compile the entire kernel like that. And there are use cases for compiling without optimizations (or less optimizations).

@nbdd0121
Copy link
Member Author

nbdd0121 commented May 9, 2021

Currently, yes. But conceptually, this is intended for the same purpose as static_assert!. I do not care how it is implemented, but I do not understand why you want to have two different implementations.

It is not. It you take a look at the snippet I mentioned in the mailing list:

impl<const SIZE: usize> IoMemBlock<SIZE> {
    pub fn write<T>(&self, value: T, offset: usize) {
        if let Some(end) = offset.checked_add(size_of::<T>()) {
            if end <= SIZE {
                // SAFETY: We just checked above that offset was within bounds.
                let ptr = unsafe { self.ptr.add(offset) } as *mut T;
                // SAFETY: We just checked that the offset+size was within bounds.
                unsafe { ptr.write_volatile(value) };
                return;
            }
        }
        // SAFETY: Unimplemented function to cause compilation error.
        unsafe { bad_write() };
    }
}

You could use link_time_assert! in this scenario:

impl<const SIZE: usize> IoMemBlock<SIZE> {
    pub fn write<T>(&self, value: T, offset: usize) {
        if let Some(end) = offset.checked_add(size_of::<T>()) {
            if end <= SIZE {
                // SAFETY: We just checked above that offset was within bounds.
                let ptr = unsafe { self.ptr.add(offset) } as *mut T;
                // SAFETY: We just checked that the offset+size was within bounds.
                unsafe { ptr.write_volatile(value) };
                return;
            }
        }
        link_time_panic("write out of bound");
    }
}

Or you could panic!, but you cannot use a static_assert! here because this is not const. It relies on knowledge that LLVM knows but not rustc.

static_assert! works only in compilation time, panic! works in compilation time in const context, but in runtime if not. There is currently no middle ground. link_time_assert! works in compilation time in const context and rely on LLVM+linker to prevent the panic from being carried to runtime.

The new name is also wrong, because it leaks implementation details.

I think it's good to be explicit about implementation detail here. You don't really have to hide it because everything is tweakable in the kernel, it's not like this is going to be used by external library.

The difference is that a simple panic!() would still work for non-optimized builds, while yours would fail to build. This is important, because we "support" -O0 and -O1 so far (not for production, just under the hacking menu, but nevertheless, it is there).

We can discuss removing support for those builds, that is fine and it would not be the first time, but it is not the same discussion.

I don't think it is necessary to support -O0, kernel C code does not support -O0 anyway. O1 can fold away constants but don't inline non-#[inline(always)] functions, so I think it is good enough. Ideally we'd have something similar to -Og in GCC, but 🤷.

Not sure who you are trying to convince here -- I added static_assert! to the project...

In fact, it is not really an "opinion", it is almost an engineering fact at this point. And it is not news either, in C++ testing conditions like this for similar cases (like templates with NTTPs) is everyday business.

link_time_assert! cannot replace static_assert!. link_time_assert! will cause an error in link time even if condition can be evaluated in const context. link_time_assert! creates a compile time error only if itself is evaluated in const context.

fn foo() {
    static_assert!(1 > 1); // Compile-time error
    link_time_assert!(1 > 1); // Link-time error
    assert!(1 > 1); // Run-time error
}

const fn bar1<const N: usize>() {
    // static_assert!(N > 1); is not allowed
    link_time_assert!(N > 1);
}

const fn bar2<const N: usize>() {
    assert!(N > 1);
}

const A1: () = bar1::<1>(); // Compile-time error
const A2: () = bar2::<1>(); // Compile-time error

fn call_bar() {
    bar(1); // Link-time error
    bar(2); // Run-time error
}

Obviously the goal is to catch error as early as possible. So static_assert if preferred if possible, link_time_assert second, and assert is only used as the last resort.

The idea of having a link error to replace a runtime error came from @wedsonaf's email to Linus Walleij in the mailing list, https://lore.kernel.org/rust-for-linux/YIbQ3dHOpyD%[email protected]/. But instead of a non-existing extern symbol I put it in a never-linked crate, and make it a const fn, so it could be used in const contexts and const fns.

I am not really sure what your point is with this paragraph either, but the link error suggestion was mentioned by Torvalds many days before that: https://lore.kernel.org/rust-for-linux/CAHk-=wgJvJJtd2mpYpx5+zn_hPrSOqGqi-Pxb7e+h+anhsLnQg@mail.gmail.com/ (and it is not a new idea).

I quoted this email precisely because it's a use case where only link_time_panic can offer.

@ojeda
Copy link
Member

ojeda commented May 9, 2021

Currently, yes. But conceptually, this is intended for the same purpose as static_assert!. I do not care how it is implemented, but I do not understand why you want to have two different implementations.

It is not. It you take a look at the snippet I mentioned in the mailing list:

It is. Let me quote your docs:

/// Asserts that a boolean expression is `true` at compile time.

and the current ones:

/// Static assert (i.e. compile-time assert).

Please tell me what is the difference.

What we want is to fail the build under a given compile-time condition. Both macros aim to do that. The fact that the current implementation does not work in some contexts does not change that.

[...]

You could use link_time_assert! in this scenario:

[...]

Or you could panic!, but you cannot use a static_assert! here because this is not const. It relies on knowledge that LLVM knows but not rustc.

static_assert! works only in compilation time, panic! works in compilation time in const context, but in runtime if not. There is currently no middle ground. link_time_assert! works in compilation time in const context and rely on LLVM+linker to prevent the panic from being carried to runtime.

This is (again) an explanation on why static_assert! cannot be used in all contexts. And, again, it does not explain why we would want both.

The new name is also wrong, because it leaks implementation details.

I think it's good to be explicit about implementation detail here.

Why would we want to be explicit? It is rarely a good idea to expose implementation details for no reason.

You don't really have to hide it because everything is tweakable in the kernel

Sure, but that is not an argument for exposing implementation details, just that we could potentially do it.

it's not like this is going to be used by external library.

rust/ code is pretty much the definition of a "library" in our context -- it is going to be used by everyone else in the kernel, same as include/ bits are.

The difference is that a simple panic!() would still work for non-optimized builds, while yours would fail to build. This is important, because we "support" -O0 and -O1 so far (not for production, just under the hacking menu, but nevertheless, it is there).
We can discuss removing support for those builds, that is fine and it would not be the first time, but it is not the same discussion.

I don't think it is necessary to support -O0, kernel C code does not support -O0 anyway. O1 can fold away constants but don't inline non-#[inline(always)] functions, so I think it is good enough. Ideally we'd have something similar to -Og in GCC, but .

It is not necessary, but that is not an argument for removing it.

Supporting this implementation is an argument, but before restricting what we can do optimization-flags-wise just for this, I would prefer having conditional compilation on this, so that for optimized builds the link error approach is used, and for others, the panic()! one. That would get us the best of both worlds.

Not sure who you are trying to convince here -- I added static_assert! to the project...
In fact, it is not really an "opinion", it is almost an engineering fact at this point. And it is not news either, in C++ testing conditions like this for similar cases (like templates with NTTPs) is everyday business.

link_time_assert! cannot replace static_assert!. link_time_assert! will cause an error in link time even if condition can be evaluated in const context. link_time_assert! creates a compile time error only if itself is evaluated in const context.

I am not sure why you are replying with this here: in the part you quote, I was saying that you do not need to convince me about compile-time errors being better than runtime ones -- I have been convinced for many years and I am an avid user of static_assert in C and C++ (and I would like Rust to have built-in support for it too).

Anyway: the code you wrote does not show it cannot replace it. In fact, it shows the opposite: what you cannot do is replace link_time_assert! with static_assert!.

Obviously the goal is to catch error as early as possible. So static_assert if preferred if possible, link_time_assert second, and assert is only used as the last resort.

This is the argument I was asking for.

So, if I understand your point of view, you are saying it is better to have two macros so that we can have compile-time errors instead of link-time errors (in a subset of cases), no?

If yes: what are the advantages? Saving a few seconds building invalid code? Not worth it at all. Better diagnostics for invalid code (for the subset of cases where both macros would work)? Worth it, but not sure if worth enough to keep two macros.

A few questions:

  • Is there an actual case where link_time_panic! cannot prevent the build from finishing where static_assert! could?

  • Is it possible to build the macro in a way that chooses the right one (e.g. to detect the context where it is called)? I assume not, but given C++ has introduced things like is_constant_evaluated...

  • Is Rust going to lift the restriction on using generic parameters in const contexts?

  • Are there any plans in Rust to reintroduce static_assert back? (e.g. Feature Request static asserts rust-lang/rfcs#2790).

I quoted this email precisely because it's a use case where only link_time_panic can offer.

The use case is clear: as I mentioned in the first message, you have one in the other PR (relax_bounds)...

@jeyhun1991

This comment was marked as spam.

@nbdd0121
Copy link
Member Author

Why would we want to be explicit? It is rarely a good idea to expose implementation details for no reason.

That's just my personal preference, I am okay with any reasonable name anyway, so feel free to give it a new name.

Supporting this implementation is an argument, but before restricting what we can do optimization-flags-wise just for this, I would prefer having conditional compilation on this, so that for optimized builds the link error approach is used, and for others, the panic()! one. That would get us the best of both worlds.

That sounds reasonable.

So, if I understand your point of view, you are saying it is better to have two macros so that we can have compile-time errors instead of link-time errors (in a subset of cases), no?

Okay, I see the root of confusion here. When I mention 'compile-time' in the discussion, I was referring strictly about the invocation of rustc, not including the linking part. I should be more clear about that.

If yes: what are the advantages? Saving a few seconds building invalid code? Not worth it at all. Better diagnostics for invalid code (for the subset of cases where both macros would work)? Worth it, but not sure if worth enough to keep two macros.

We might want to introduce similar things to cargo check (aka rustc -Z no-codegen) to speed up development process. static_assert! in the current form can catch that, but link_time_assert! would not. Also, as you mentioned, better diagnostics (static_assert! produced way better diagnostics with stack traces, but link_time_assert! can only show you the caller).

Is there an actual case where link_time_panic! cannot prevent the build from finishing where static_assert! could?

Yes. If link_time_assert! is itself dead code, then it would not prevent the build while static_assert! still would. I believe this case will be rare though. Also static_assert can appear anywhere an item can (e.g. inside a module), but link_time_assert! is an expression so it can appear in fewer places.

Is it possible to build the macro in a way that chooses the right one (e.g. to detect the context where it is called)? I assume not, but given C++ has introduced things like is_constant_evaluated...

No, at least not yet.

Is Rust going to lift the restriction on using generic parameters in const contexts?

I doubt this restriction will be lifted, at least not in the near future. Unlike C++ templates, generics are type-checked ahead of time, not when they're monomorphized.

The use case is clear: as I mentioned in the first message, you have one in the other PR (relax_bounds)...

It's different. The relax_bound use case can be expressed with C++'s static_assert as the condition , but that use case couldn't.

@nbdd0121
Copy link
Member Author

Would it be possible to have LD warn about all usages of a function but still proceed to link it?

@ojeda
Copy link
Member

ojeda commented May 10, 2021

(Gary and I talked about this offline because it seems we misunderstood each other a bit)

If link_time_assert! is itself dead code, then it would not prevent the build while static_assert! still would. I believe this case will be rare though. Also static_assert can appear anywhere an item can (e.g. inside a module), but link_time_assert! is an expression so it can appear in fewer places.

These two are very good points and I think it seals the deal taken together with the diagnostics.

I am okay with any reasonable name anyway, so feel free to give it a new name.

Initially I thought about static_assert_generics! or something like that, but Gary suggested build_assert/error! or similar, which sounds fine to me (i.e. what the person writing the code wants is to fail the build, somehow, linker error or otherwise).

@wedsonaf
Copy link

Initially I thought about static_assert_generics! or something like that, but Gary suggested build_assert/error! or similar, which sounds fine to me (i.e. what the person writing the code wants is to fail the build, somehow, linker error or otherwise).

FWIW, I like the build_ prefix as it gives as intuitive indication of what this is meant for.

@nbdd0121
Copy link
Member Author

nbdd0121 commented May 10, 2021

Tried adjusting CI configs and adding exports_build_assert_generated.h, but still getting Inconsistent kallsyms data which I could not reproduce locally.

@nbdd0121
Copy link
Member Author

I've added a choice to Kconfig.debug to control the behaviour of build_panic and build_assert.

  • RUST_BUILD_ASSERT_ALLOW will link build_assert.o, so the check will be run-time.
  • RUST_BUILD_ASSERT_WARN is similar to ALLOW, but it will also add a .gnu.warning section so that GNU ld would generate a warning.
  • RUST_BUILD_ASSERT_DENY will not link build_assert.o, so the linker will reject.

Meanwhile CI still gives me Inconsistent kallsyms data.

rust/build_assert.rs Outdated Show resolved Hide resolved
//! be enforced in Rust (e.g. perform some checks in const functions, but those
//! functions could still be called in the runtime).
//!
//! This crate provides a method `build_error`, which will panic in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention build_error (here and everywhere else)? Should a user ever call it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do. It might be called after using checked math ops to unwrap Option.

rust/build_assert.rs Outdated Show resolved Hide resolved
rust/build_assert.rs Outdated Show resolved Hide resolved
@nbdd0121
Copy link
Member Author

nbdd0121 commented May 17, 2021

Several changes:

  • Now depends on rust: Run rustdoc for the target triple #272.
  • build_error crate now only have a single build_error function, the rest is moved to kernel crate.
  • Make build_error a hidden function, and then add a build_error! to mimick panic.
  • Re-export build_assert! in prelude
  • Examples added.
  • Make print.rs use build_assert!

`local_apic` is a packed type and rustc will complain that it
transitively contain a `#[repr(align)]` type.

Signed-off-by: Gary Guo <[email protected]>
@nbdd0121
Copy link
Member Author

Seems to be a transient failure, retry CI

@nbdd0121 nbdd0121 changed the title Link time panic Build-time assertion May 17, 2021
rust/kernel/build_assert.rs Outdated Show resolved Hide resolved
rust/kernel/build_assert.rs Outdated Show resolved Hide resolved
rust/kernel/build_assert.rs Outdated Show resolved Hide resolved
lib/Kconfig.debug Outdated Show resolved Hide resolved
lib/Kconfig.debug Outdated Show resolved Hide resolved
@@ -77,7 +82,7 @@ bindgen_c_flags = $(filter-out $(bindgen_skip_c_flags), $(c_flags)) \
$(bindgen_extra_c_flags)
endif

bindgen_opaque_types := xregs_state desc_struct arch_lbr_state
bindgen_opaque_types := xregs_state desc_struct arch_lbr_state local_apic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one related or is another cleanup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a separate commit. I couldn't build on my machine without this change, but it is trivial enough that I think it might not worth a separate PR.

rust/build_error.rs Outdated Show resolved Hide resolved
rust/build_error.rs Outdated Show resolved Hide resolved
@ojeda
Copy link
Member

ojeda commented May 17, 2021

  • Make print.rs use build_assert!

This does not seem to be here, is it intentional or you wanted to move it to another PR?

@ojeda
Copy link
Member

ojeda commented May 17, 2021

It looks pretty good!

@nbdd0121
Copy link
Member Author

  • Make print.rs use build_assert!

This does not seem to be here, is it intentional or you wanted to move it to another PR?

I am doing some other work on print.rs so I removed it from this PR.

@nbdd0121
Copy link
Member Author

Thanks for the grammar fixes, changes incorporated and squashed

@ksquirrel
Copy link
Member

Review of 5043400eeca3:

  • ✔️ Commit 478bc95: Looks fine!
  • ✔️ Commit b8b9cab: Looks fine!
  • ✔️ Commit 5043400: Looks fine!

@ojeda
Copy link
Member

ojeda commented May 19, 2021

Let's get this in!

@ojeda ojeda merged commit da31291 into Rust-for-Linux:rust May 19, 2021
@nbdd0121 nbdd0121 deleted the link-time-panic branch May 19, 2021 19:41
ojeda pushed a commit that referenced this pull request Jul 25, 2022
…one()

The subpage we calculate is an invalid pointer for device private pages,
because device private pages are mapped via non-present device private
entries, not ordinary present PTEs.

Let's just not compute broken pointers and fixup later.  Move the proper
assignment of the correct subpage to the beginning of the function and
assert that we really only have a single page in our folio.

This currently results in a BUG when tying to compute anon_exclusive,
because:

[  528.727237] BUG: unable to handle page fault for address: ffffea1fffffffc0
[  528.739585] #PF: supervisor read access in kernel mode
[  528.745324] #PF: error_code(0x0000) - not-present page
[  528.751062] PGD 44eaf2067 P4D 44eaf2067 PUD 0
[  528.756026] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  528.760890] CPU: 120 PID: 18275 Comm: hmm-tests Not tainted 5.19.0-rc3-kfd-alex #257
[  528.769542] Hardware name: AMD Corporation BardPeak/BardPeak, BIOS RTY1002BDS 09/17/2021
[  528.778579] RIP: 0010:try_to_migrate_one+0x21a/0x1000
[  528.784225] Code: f6 48 89 c8 48 2b 05 45 d1 6a 01 48 c1 f8 06 48 29
c3 48 8b 45 a8 48 c1 e3 06 48 01 cb f6 41 18 01 48 89 85 50 ff ff ff 74
0b <4c> 8b 33 49 c1 ee 11 41 83 e6 01 48 8b bd 48 ff ff ff e8 3f 99 02
[  528.805194] RSP: 0000:ffffc90003cdfaa0 EFLAGS: 00010202
[  528.811027] RAX: 00007ffff7ff4000 RBX: ffffea1fffffffc0 RCX: ffffeaffffffffc0
[  528.818995] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc90003cdfaf8
[  528.826962] RBP: ffffc90003cdfb70 R08: 0000000000000000 R09: 0000000000000000
[  528.834930] R10: ffffc90003cdf910 R11: 0000000000000002 R12: ffff888194450540
[  528.842899] R13: ffff888160d057c0 R14: 0000000000000000 R15: 03ffffffffffffff
[  528.850865] FS:  00007ffff7fdb740(0000) GS:ffff8883b0600000(0000) knlGS:0000000000000000
[  528.859891] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  528.866308] CR2: ffffea1fffffffc0 CR3: 00000001562b4003 CR4: 0000000000770ee0
[  528.874275] PKRU: 55555554
[  528.877286] Call Trace:
[  528.880016]  <TASK>
[  528.882356]  ? lock_is_held_type+0xdf/0x130
[  528.887033]  rmap_walk_anon+0x167/0x410
[  528.891316]  try_to_migrate+0x90/0xd0
[  528.895405]  ? try_to_unmap_one+0xe10/0xe10
[  528.900074]  ? anon_vma_ctor+0x50/0x50
[  528.904260]  ? put_anon_vma+0x10/0x10
[  528.908347]  ? invalid_mkclean_vma+0x20/0x20
[  528.913114]  migrate_vma_setup+0x5f4/0x750
[  528.917691]  dmirror_devmem_fault+0x8c/0x250 [test_hmm]
[  528.923532]  do_swap_page+0xac0/0xe50
[  528.927623]  ? __lock_acquire+0x4b2/0x1ac0
[  528.932199]  __handle_mm_fault+0x949/0x1440
[  528.936876]  handle_mm_fault+0x13f/0x3e0
[  528.941256]  do_user_addr_fault+0x215/0x740
[  528.945928]  exc_page_fault+0x75/0x280
[  528.950115]  asm_exc_page_fault+0x27/0x30
[  528.954593] RIP: 0033:0x40366b
...

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6c28760 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: "Sierra Guiza, Alejandro (Alex)" <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Tested-by: Alistair Popple <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Darksonn pushed a commit to Darksonn/linux that referenced this pull request Dec 5, 2023
hid_debug_events_release releases resources bound to the HID device instance.
hid_device_release releases the underlying HID device instance potentially
before hid_debug_events_release has completed releasing debug resources bound
to the same HID device instance.

Reference count to prevent the HID device instance from being torn down
preemptively when HID debugging support is used. When count reaches zero,
release core resources of HID device instance using hiddev_free.

The crash:

[  120.728477][ T4396] kernel BUG at lib/list_debug.c:53!
[  120.728505][ T4396] Internal error: Oops - BUG: 0 [Rust-for-Linux#1] PREEMPT SMP
[  120.739806][ T4396] Modules linked in: bcmdhd dhd_static_buf 8822cu pcie_mhi r8168
[  120.747386][ T4396] CPU: 1 PID: 4396 Comm: hidt_bridge Not tainted 5.10.110 Rust-for-Linux#257
[  120.754771][ T4396] Hardware name: Rockchip RK3588 EVB4 LP4 V10 Board (DT)
[  120.761643][ T4396] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
[  120.768338][ T4396] pc : __list_del_entry_valid+0x98/0xac
[  120.773730][ T4396] lr : __list_del_entry_valid+0x98/0xac
[  120.779120][ T4396] sp : ffffffc01e62bb60
[  120.783126][ T4396] x29: ffffffc01e62bb60 x28: ffffff818ce3a200
[  120.789126][ T4396] x27: 0000000000000009 x26: 0000000000980000
[  120.795126][ T4396] x25: ffffffc012431000 x24: ffffff802c6d4e00
[  120.801125][ T4396] x23: ffffff8005c66f00 x22: ffffffc01183b5b8
[  120.807125][ T4396] x21: ffffff819df2f100 x20: 0000000000000000
[  120.813124][ T4396] x19: ffffff802c3f0700 x18: ffffffc01d2cd058
[  120.819124][ T4396] x17: 0000000000000000 x16: 0000000000000000
[  120.825124][ T4396] x15: 0000000000000004 x14: 0000000000003fff
[  120.831123][ T4396] x13: ffffffc012085588 x12: 0000000000000003
[  120.837123][ T4396] x11: 00000000ffffbfff x10: 0000000000000003
[  120.843123][ T4396] x9 : 455103d46b329300 x8 : 455103d46b329300
[  120.849124][ T4396] x7 : 74707572726f6320 x6 : ffffffc0124b8cb5
[  120.855124][ T4396] x5 : ffffffffffffffff x4 : 0000000000000000
[  120.861123][ T4396] x3 : ffffffc011cf4f90 x2 : ffffff81fee7b948
[  120.867122][ T4396] x1 : ffffffc011cf4f90 x0 : 0000000000000054
[  120.873122][ T4396] Call trace:
[  120.876259][ T4396]  __list_del_entry_valid+0x98/0xac
[  120.881304][ T4396]  hid_debug_events_release+0x48/0x12c
[  120.886617][ T4396]  full_proxy_release+0x50/0xbc
[  120.891323][ T4396]  __fput+0xdc/0x238
[  120.895075][ T4396]  ____fput+0x14/0x24
[  120.898911][ T4396]  task_work_run+0x90/0x148
[  120.903268][ T4396]  do_exit+0x1bc/0x8a4
[  120.907193][ T4396]  do_group_exit+0x8c/0xa4
[  120.911458][ T4396]  get_signal+0x468/0x744
[  120.915643][ T4396]  do_signal+0x84/0x280
[  120.919650][ T4396]  do_notify_resume+0xd0/0x218
[  120.924262][ T4396]  work_pending+0xc/0x3f0

[ Rahul Rameshbabu <[email protected]>: rework changelog ]
Fixes: cd667ce ("HID: use debugfs for events/reports dumping")
Signed-off-by: Charles Yi <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
metaspace pushed a commit that referenced this pull request Feb 7, 2024
Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory
type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens:

[1136314.192256] BUG: kernel NULL pointer dereference, address:
0000000000000034
[1136314.203943] #PF: supervisor read access in kernel mode
[1136314.213768] #PF: error_code(0x0000) - not-present page
[1136314.223550] PGD 0 P4D 0
[1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257
[1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT,
BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210
[1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86
[1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246
[1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX:
0000000000000000
[1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI:
ffffc9003168c000
[1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09:
0000000000010000
[1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12:
0000000000000001
[1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15:
0000000000000001
[1136314.373298] FS:  00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000)
knlGS:0000000000000000
[1136314.386105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4:
00000000007706f0
[1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[1136314.431890] PKRU: 55555554
[1136314.439143] Call Trace:
[1136314.446058]  <IRQ>
[1136314.452465]  ? __die+0x20/0x70
[1136314.459881]  ? page_fault_oops+0x15b/0x440
[1136314.468305]  ? exc_page_fault+0x6a/0x150
[1136314.476491]  ? asm_exc_page_fault+0x22/0x30
[1136314.484927]  ? __xdp_return+0x6c/0x210
[1136314.492863]  bpf_xdp_adjust_tail+0x155/0x1d0
[1136314.501269]  bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60
[1136314.511263]  ice_clean_rx_irq_zc+0x206/0xc60 [ice]
[1136314.520222]  ? ice_xmit_zc+0x6e/0x150 [ice]
[1136314.528506]  ice_napi_poll+0x467/0x670 [ice]
[1136314.536858]  ? ttwu_do_activate.constprop.0+0x8f/0x1a0
[1136314.546010]  __napi_poll+0x29/0x1b0
[1136314.553462]  net_rx_action+0x133/0x270
[1136314.561619]  __do_softirq+0xbe/0x28e
[1136314.569303]  do_softirq+0x3f/0x60

This comes from __xdp_return() call with xdp_buff argument passed as
NULL which is supposed to be consumed by xsk_buff_free() call.

To address this properly, in ZC case, a node that represents the frag
being removed has to be pulled out of xskb_list. Introduce
appropriate xsk helpers to do such node operation and use them
accordingly within bpf_xdp_adjust_tail().

Fixes: 24ea501 ("xsk: support mbuf on ZC RX")
Acked-by: Magnus Karlsson <[email protected]> # For the xsk header part
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
ojeda pushed a commit that referenced this pull request Sep 30, 2024
While using the IOMMU DMA path, the dma_addressing_limited() function
checks ops struct which doesn't exist in the IOMMU case. This causes
to the kernel panic while loading ADMGPU driver.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 10 UID: 0 PID: 611 Comm: (udev-worker) Tainted: G                T  6.11.0-clang-07154-g726e2d0cf2bb #257
Tainted: [T]=RANDSTRUCT
Hardware name: ASUS System Product Name/ROG STRIX Z690-G GAMING WIFI, BIOS 3701 07/03/2024
RIP: 0010:dma_addressing_limited+0x53/0xa0
Code: 8b 93 48 02 00 00 48 39 d1 49 89 d6 4c 0f 42 f1 48 85 d2 4c 0f 44 f1 f6 83 fc 02 00 00 40 75 0a 48 89 df e8 1f 09 00 00 eb 24 <4c> 8b 1c 25 a0 00 00 00 4d 85 db 74 17 48 89 df 41 ba 8b 84 2d 55
RSP: 0018:ffffa8d2c12cf740 EFLAGS: 00010202
RAX: 00000000ffffffff RBX: ffff8948820220c8 RCX: 000000ffffffffff
RDX: 0000000000000000 RSI: ffffffffc124dc6d RDI: ffff8948820220c8
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff894883c3f040
R13: ffff89488dac8828 R14: 000000ffffffffff R15: ffff8948820220c8
FS:  00007fe6ba881900(0000) GS:ffff894fdf700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 0000000111984000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body+0x65/0xc0
 ? page_fault_oops+0x3b9/0x450
 ? _prb_read_valid+0x212/0x390
 ? do_user_addr_fault+0x608/0x680
 ? exc_page_fault+0x4e/0xa0
 ? asm_exc_page_fault+0x26/0x30
 ? dma_addressing_limited+0x53/0xa0
 amdgpu_ttm_init+0x56/0x4b0 [amdgpu]
 gmc_v8_0_sw_init+0x561/0x670 [amdgpu]
 amdgpu_device_ip_init+0xf5/0x570 [amdgpu]
 amdgpu_device_init+0x1a57/0x1ea0 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x1a/0x40
 ? pci_conf1_read+0xc0/0xe0
 ? pci_bus_read_config_word+0x52/0xa0
 amdgpu_driver_load_kms+0x15/0xa0 [amdgpu]
 amdgpu_pci_probe+0x1b7/0x4c0 [amdgpu]
 pci_device_probe+0x1c5/0x260
 really_probe+0x130/0x470
 __driver_probe_device+0x77/0x150
 driver_probe_device+0x19/0x120
 __driver_attach+0xb1/0x1e0
 ? __cfi___driver_attach+0x10/0x10
 bus_for_each_dev+0x115/0x170
 bus_add_driver+0x192/0x2d0
 driver_register+0x5c/0xf0
 ? __cfi_init_module+0x10/0x10 [amdgpu]
 do_one_initcall+0x128/0x380
 ? idr_alloc_cyclic+0x139/0x1d0
 ? security_kernfs_init_security+0x42/0x140
 ? __kernfs_new_node+0x1be/0x250
 ? sysvec_apic_timer_interrupt+0xb6/0xc0
 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
 ? _raw_spin_unlock+0x11/0x30
 ? free_unref_page+0x283/0x650
 ? kfree+0x274/0x3a0
 ? kfree+0x274/0x3a0
 ? kfree+0x274/0x3a0
 ? load_module+0xf2e/0x1130
 ? __kmalloc_cache_noprof+0x12a/0x2e0
 do_init_module+0x7d/0x240
 __se_sys_init_module+0x19e/0x220
 do_syscall_64+0x8a/0x150
 ? __irq_exit_rcu+0x5e/0x100
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fe6bb5980ee
Code: 48 8b 0d 3d ed 12 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0a ed 12 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd462219d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 0000556caf0d0670 RCX: 00007fe6bb5980ee
RDX: 0000556caf0d3080 RSI: 0000000002893458 RDI: 00007fe6b3400010
RBP: 0000000000020000 R08: 0000000000020010 R09: 0000000000000080
R10: c26073c166186e00 R11: 0000000000000206 R12: 0000556caf0d3430
R13: 0000556caf0d0670 R14: 0000556caf0d3080 R15: 0000556caf0ce700
 </TASK>
Modules linked in: amdgpu(+) i915(+) drm_suballoc_helper intel_gtt drm_exec drm_buddy iTCO_wdt i2c_algo_bit intel_pmc_bxt drm_display_helper iTCO_vendor_support gpu_sched drm_ttm_helper cec ttm amdxcp video backlight pinctrl_alderlake nct6775 hwmon_vid nct6775_core coretemp
CR2: 00000000000000a0
---[ end trace 0000000000000000 ]---
RIP: 0010:dma_addressing_limited+0x53/0xa0
Code: 8b 93 48 02 00 00 48 39 d1 49 89 d6 4c 0f 42 f1 48 85 d2 4c 0f 44 f1 f6 83 fc 02 00 00 40 75 0a 48 89 df e8 1f 09 00 00 eb 24 <4c> 8b 1c 25 a0 00 00 00 4d 85 db 74 17 48 89 df 41 ba 8b 84 2d 55
RSP: 0018:ffffa8d2c12cf740 EFLAGS: 00010202
RAX: 00000000ffffffff RBX: ffff8948820220c8 RCX: 000000ffffffffff
RDX: 0000000000000000 RSI: ffffffffc124dc6d RDI: ffff8948820220c8
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff894883c3f040
R13: ffff89488dac8828 R14: 000000ffffffffff R15: ffff8948820220c8
FS:  00007fe6ba881900(0000) GS:ffff894fdf700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 0000000111984000 CR4: 0000000000f50ef0
PKRU: 55555554

Fixes: b5c58b2 ("dma-mapping: direct calls for dma-iommu")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219292
Reported-by: Niklāvs Koļesņikovs <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Niklāvs Koļesņikovs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants