Skip to content

Latest commit

 

History

History
245 lines (184 loc) · 8.5 KB

2386-used.md

File metadata and controls

245 lines (184 loc) · 8.5 KB

Summary

Stabilize the #[used] attribute which is used to force the compiler to keep static variables, even if not referenced by any other part of the program, in the output object file.

Motivation

Bare metal applications, like kernels, bootloaders and other firmware, usually need precise control over the memory layout of the program. These programs usually need to place data structures like vector (interrupt) tables in certain memory locations for the system to operate properly.

The final memory layout of the program is decided by the linker; bare metal applications make use of linker scripts to control the placement of (linker) sections in memory. But for all this to work the vector table must be present in the object files passed to the linker. That's where the #[used] attribute comes in: without it the compiler will optimize away the vector table, as it's not directly used by the program, and it will never reach the linker.

It's possible to work around the lack of the #[used] attribute by declaring the vector table as public:

// public items are exposed in the object file
#[link_section = ".vector_table.exceptions"]
pub static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */];

But this is brittle because the compiler can still optimize the symbol away when compiling with LTO enabled -- with LTO the compiler has global knowledge about the program, and will see that EXCEPTIONS is unused by the program and discard it.

Yet another workaround is to force a volatile load of the vector table in some part of the program, usually before main. The compiler will always keep the vector table in this case but this alternative incurs in the cost of a load operation that will never be optimized away by the compiler.

#[link_section = ".vector_table.exceptions"]
static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */];

// entry point of the firmware
fn reset() -> ! {
    extern "C" {
        // user entry point
        fn main() -> !;
    }

    // this operation will never be optimized away
    unsafe { ptr::read_volatile(&EXCEPTIONS[0]) };

    main()
}

The proper solution to keeping the vector table is to mark the vector table as a used variable to force the compiler to keep in one of the emitted object files.

#[used] // will be present in the object file
#[link_section = ".vector_table.exceptions"]
static EXCEPTIONS: [extern "C" fn(); 14] = [/* .. */];

Guide-level explanation

We can think of the compilation process performed by rustc as a two stage process. First, rustc compiles a crate (source code) into object files, then rustc invokes the linker on those object files to produce a single executable, or shared library (e.g. .so) if the crate type was set to "cdylib".

The #[used] attribute can be applied to static variables to keep them in the object files produced by rustc, even in the presence of LTO. Note that this does not mean that the static variable will make its way into the binary file emitted by the linker as the linker is free to drop symbols that it deems unused. In other words, the #[used] attribute does not affect the behavior of the linker.

Consider the following program:

#[used]
static FOO: u32 = 0;
static BAR: u32 = 0;

fn main() {}

The variable FOO marked with the #[used] attribute will be kept in the emitted object file regardless of the optimization level. On the other hand, the unused variable BAR is always optimized away.

$ cargo rustc -- --emit=obj  # for simplicity incr. comp. has been disabled
$ nm -C $(find target -name '*.o')
(..)
0000000000000000 r foo::FOO
0000000000000000 t foo::main
0000000000000000 T std::rt::lang_start
(..)

$ cargo clean; cargo rustc --release --
$ nm -C $(find target -name '*.o')
0000000000000000 T main
0000000000000000 r foo::FOO
0000000000000000 t foo::main
0000000000000000 T std::rt::lang_start
(..)

$ cargo clean; cargo rustc --release -- --emit=obj -C lto
$ nm -C $(find target -name '*.o')
(..)
0000000000000000 r foo::FOO
0000000000000000 t foo::main
(..)

FOO never makes it to the final executable because the linker sees that the call graph that stems from the user entry point main never makes use of FOO and discards it.

$ cargo clean; cargo build
$ nm -C target/debug/foo | grep FOO || echo not found
not found

To keep FOO in the final binary assistance from the linker is required; this usually means writing a linker script.

Consider the following program:

#[used]
#[link_section = ".init_array"]
static FOO: extern "C" fn() = before_main;

extern "C" fn before_main() {
    println!("Hello")
}

fn main() {
    println!("World")
}

When dealing with ELF files the .init_array section will usually be kept in the final binary by the default linker script. If the system supports it, all function pointers stored in the .init_array section will be called before entering main. Thus, the above program prints "Hello" and then "World" to the console when run on a *nix system.

$ cargo run --release
Hello
World

$ nm -C target/release/foo | grep FOO
000000000026b620 t foo::FOO

If the #[used] attribute is removed from the source code then only "World" is printed to the console as the FOO variable will get optimized away by the compiler.

Reference-level explanation

The #[used] attribute can only be used on static variables. Static variables marked with this attribute will be appended to the special @llvm.used global variable when lowered to LLVM IR.

#[used]
static FOO: u32 = 0;

fn main() {}
$ cargo clean; cargo rustc -- --emit=llvm-ir
$ grep llvm.used $(find -name '*.ll')
@llvm.used = appending global [1 x i8*] [i8* getelementptr inbounds (<{ [4 x i8] }>, <{ [4 x i8] }>* @_ZN3foo3FOO17hf0af6b03a826c578E, i32 0, i32 0, i32 0)], section "llvm.metadata"

The semantics of this operation are (quoting LLVM reference):

If a symbol appears in the @llvm.used list, then the compiler, assembler, and linker are required to treat the symbol as if there is a reference to the symbol that it cannot see (which is why they have to be named). For example, if a variable has internal linkage and no references other than that from the @llvm.used list, it cannot be deleted. This is commonly used to represent references from inline asms and other things the compiler cannot “see”, and corresponds to “attribute((used))” in GNU C.

strikethrough added by the author

The part about the linker is not true (*): from the point of view of the linker static variables marked with #[used] look exactly the same as variables that have not been marked with that attribute -- those are the implemented LLVM semantics. Also ELF object files have no mechanism to prevent the linker from dropping its symbols if they are not referenced by other object files.

(*) unless "linker" is actually referring to llvm-link (?)

Drawbacks

This is yet another low level feature that alternative rustc implementations would have to implement to be 100% compatible with the official LLVM based rustc implementation. Also see #[repr(align = "*")], #[repr(*)], #[link_section], etc.

Rationale and alternatives

Chosen design

This design pretty much matches how C compilers implement this feature. See prior art section below.

Not doing this

Not doing this means that people will continue to use the brittle workarounds presented in the motivation section.

Prior art

Most compilers provide a feature with the exact same semantics: usually in the form of a "used" attribute (e.g. __attribute(used)__) that can be applied to static variables.

The following C code is an example from the KEIL toolchain documentation:

static int lose_this = 1;
static int keep_this __attribute__((used)) = 2;     // retained in object file
static int keep_this_too __attribute__((used)) = 3; // retained in object file

Unresolved questions

None so far.