Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing large static global behaves incorrectly #121868

Closed
zyedidia opened this issue Mar 1, 2024 · 13 comments
Closed

Accessing large static global behaves incorrectly #121868

zyedidia opened this issue Mar 1, 2024 · 13 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@zyedidia
Copy link
Contributor

zyedidia commented Mar 1, 2024

I tried this code:

static VALS: [u8; 1024 * 1024 * 1024 * 4] = [0u8; 1024 * 1024 * 1024 * 4];

fn main() {
    for i in 0..2048 {
        println!("{}", VALS[i]);
        if VALS[i] != 0 {
            println!("NON-ZERO: {}", VALS[i]);
            return
        }
    }
}

I expected to see this happen: it should print 2048 zeroes, since the array is statically initialized to zero.

Instead, this happened: it prints random values of memory. In debug mode, it also exits at the first non-zero value and prints "NON-ZERO". In release mode, it never prints "NON-ZERO", but the values that it displays are in fact non-zero (I believe this is cause it is optimizing out the if statement).

It appears that the linker is not including the data for the VALS array in the binary, but a symbol is still included, which causes the loads to access other variables stored at the location of the symbol.

Meta

rustc --version --verbose:

rustc 1.78.0-nightly (878c8a2a6 2024-02-29)
binary: rustc
commit-hash: 878c8a2a62d49ca5c454547ad67290a1df746cb5
commit-date: 2024-02-29
host: x86_64-unknown-linux-gnu
release: 1.78.0-nightly
LLVM version: 18.1.0
@zyedidia zyedidia added the C-bug Category: This is a bug. label Mar 1, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 1, 2024
@jieyouxu jieyouxu added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 1, 2024
@Noratrieb Noratrieb added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 1, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 1, 2024
@bjorn3
Copy link
Member

bjorn3 commented Mar 1, 2024

In the playground trying to get the llvm ir or output assembly results in a crash. Maybe there is an overflow at the llvm ir level?

@clubby789
Copy link
Contributor

Emitting asm/llvm works fine for me on CLI, I think it's just a playground issue

@erer1243
Copy link
Contributor

erer1243 commented Mar 3, 2024

@rustbot claim

@erer1243
Copy link
Contributor

erer1243 commented Mar 4, 2024

This issue is caused by a known problem in LLVM, which is the use of unsigned instead of size_t in this group of functions: https://llvm.org/doxygen/group__LLVMCCoreValueConstantComposite.html. On x86_64, unsigned is 32 bits, giving us an overflow at a length 2^32 array. This explains the inclusion of a symbol but no data - the linker is "correctly" including a zero-sized array.

This was encountered by Zig at ziglang/zig#1424, and one of their folks created an updated version of LLVMConstArray which was accepted into LLVM 17. This was also acknowledged in rustc https://github.com/rust-lang/rust/blame/89b78304e82dc5114e3b2faa0fbec747a28a2b37/compiler/rustc_codegen_llvm/src/llvm/ffi.rs#L951.

Unfortunately this doesn't help because the problem call is actually to LLVMConstStringInContext, made here

llvm::LLVMConstStringInContext(llcx, ptr, bytes.len() as c_uint, True)

The best fix would be to update all of these functions in LLVM (why didn't the first PR just update them all? :P), then eventually integrate them here.

In the meantime, we can add correct implementations of these functions to rustc_llvm/llvm-wrapper/RustWrapper.cpp. This is not a complete fix because introducing huge statics requires a larger memory model when linking (https://www.smcm.iqfr.csic.es/docs/intel/compiler_f/main_for/bldaps_for/lin/bldaps_spec_memmods.htm). After doing this fix, the given program won't link on my end for this reason. I don't know how easy it is to change the memory model that rustc invokes.

Alternatively we can just add overflow checks to all of the callsites and disallow 2^32-length constant arrays & strings. GCC & Clang also choke on const char VALS[1024l * 1024l * 1024l * 4l] = {0}; on my machine, fwiw.

If the goal is just to have a huge static block of zeros at runtime, you can also just make VALS mutable. This moves it into .bss, sidestepping the entire issue. Also, the reason the playground doesn't like this code is because the const evaluator must generate and store those 2^32 zeros in memory.

@zyedidia
Copy link
Contributor Author

zyedidia commented Mar 4, 2024

Thanks for the info. I actually initially ran into this problem while trying to make a large mutable array, but I used the immutable example in the report because it didn't require unsafe to access (I forgot this would cause it not to go in the BSS). The following example still breaks, even though it should be possible to put the data in the BSS (I think).

static mut VALS: [u8; 1024 * 1024 * 1024 * 4] = [0u8; 1024 * 1024 * 1024 * 4];

fn main() {
    unsafe {
        for i in 0..2048 {
            println!("{}", VALS[i]);
            if VALS[i] != 0 {
                println!("NON-ZERO: {}", VALS[i]);
                return
            }
        }
    }
}

@erer1243
Copy link
Contributor

erer1243 commented Mar 4, 2024

Oh that's my mistake - I had the accesses commented out and it optimized VALS out :P. It does work for C, though. You could use that fact if this is something you really wanted.

@apiraino
Copy link
Contributor

apiraino commented Mar 5, 2024

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 5, 2024
@apiraino
Copy link
Contributor

apiraino commented Mar 7, 2024

Priority downgraded (Zulip discussion) as it's not a release blocker and in out there since some time.

@rustbot label -I-prioritize +P-high -P-critical

@rustbot rustbot added P-high High priority and removed P-critical Critical priority labels Mar 7, 2024
@erer1243
Copy link
Contributor

erer1243 commented Mar 9, 2024

This will be fixed by #122000

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 11, 2024
Fix 32-bit overflows in LLVM composite constants

Inspired by rust-lang#121868. Fixes unsoundness created when constructing constant arrays, strings, and structs with 2^32 or more elements on x86_64. This introduces copies of a few LLVM functions that have their signatures updated to use size_t in place of unsigned int. Alternatively we could just add overflow checks and just disallow huge composite constants. That introduces less code, but maybe a huge static block of memory is useful in embedded/no-os situations?
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 12, 2024
Fix 32-bit overflows in LLVM composite constants

Inspired by rust-lang#121868. Fixes unsoundness created when constructing constant arrays, strings, and structs with 2^32 or more elements on x86_64. This introduces copies of a few LLVM functions that have their signatures updated to use size_t in place of unsigned int. Alternatively we could just add overflow checks and just disallow huge composite constants. That introduces less code, but maybe a huge static block of memory is useful in embedded/no-os situations?
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 12, 2024
Rollup merge of rust-lang#122000 - erer1243:issue-121868, r=nikic

Fix 32-bit overflows in LLVM composite constants

Inspired by rust-lang#121868. Fixes unsoundness created when constructing constant arrays, strings, and structs with 2^32 or more elements on x86_64. This introduces copies of a few LLVM functions that have their signatures updated to use size_t in place of unsigned int. Alternatively we could just add overflow checks and just disallow huge composite constants. That introduces less code, but maybe a huge static block of memory is useful in embedded/no-os situations?
@riking
Copy link

riking commented Mar 26, 2024

I don't know how easy it is to change the memory model that rustc invokes.

It's -C code-model=large, and can be set using target triple rustflags in Cargo.toml. https://doc.rust-lang.org/cargo/reference/config.html?highlight=rustflags#targettriplerustflags

Also, please don't use static mut! Instead: static VALS: SyncUnsafeCell<[u8; 1024 * 1024 * 1024 * 4]> = ...;

@RalfJung
Copy link
Member

This will be fixed by #122000

That PR landed, so should this be closed?

Also, please don't use static mut! Instead: static VALS: SyncUnsafeCell<[u8; 1024 * 1024 * 1024 * 4]> = ...;

FWIW I don't agree with this advice. Anyway that's kind of off-topic here. :)

@erer1243
Copy link
Contributor

@RalfJung yes I believe it should be closed

@RalfJung
Copy link
Member

Great, thanks to everyone who contributed to the fix. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants