cargo publish "memory allocation of .. bytes failed" with large, ignored test assets #7543

FauxFaux · 2019-10-27T13:21:51Z

Problem

cargo publish fails with:

memory allocation of 1099511627782 bytes failed
zsh: abort (core dumped) cargo publish --dry-run

...if you have large files in your build directory, even if they are excluded.

I expect this to complete in all cases, i.e. a crate can be published regardless of how much memory I have (within reason!). But, I especially expect it to pass in this case, where the file causing the error is ignored, and will not be published.

Steps

Have a test data directory which is ignored.

mkdir -p tests/generated
echo '*' > tests/generated/.gitignore
g add -f tests/generated/.gitignore

Have a build.rs which creates a large file in this test data directory.

use std::io::{Seek, SeekFrom, Write};
use std::fs;

fn main() {
    let mut f = fs::File::create("tests/generated/large.txt").unwrap();
    f.seek(SeekFrom::Start(1024 * 1024 * 1024 * 1024)).unwrap();
    f.write_all(b"hello").unwrap();
}

% du -h tests/generated/large.txt    
4.0K	tests/generated/large.txt

% du --apparent-size -h tests/generated/large.txt
1.1T	tests/generated/large.txt

Run cargo package or cargo publish.

% cargo +stable publish --dry-run --allow-dirty
    Updating crates.io index
warning: manifest has no description, license, license-file, documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
   Packaging dozen v0.1.0 (foo/dozen)
   Verifying dozen v0.1.0 (foo/dozen)
   Compiling dozen v0.1.0 (foo/dozen/target/package/dozen-0.1.0)
    Finished dev [unoptimized + debuginfo] target(s) in 0.53s
memory allocation of 1099511627782 bytes failed
zsh: abort (core dumped)  cargo +stable publish --dry-run --allow-dirty

Possible Solution(s)

I believe this was introduced by 78a60bc , which intentionally hashes all of build.rs's output, to detect changes it has made. I believe that change is maybe correct in this case; if build.rs changed the test data, then maybe the tests will perform differently, so it's a different test?

The allocation is failing here, which you can see with gdb, but not with logging or backtraces:

cargo/src/cargo/ops/cargo_package.rs

Lines 693 to 694 in 759431f

    
           let contents = fs::read(entry.path())?; 
        
           let hash = util::hex::hash_u64(&contents);

i.e. read the entire file into memory, then hash the contents. Chunked hashing should not require unbounded memory?

Maybe:

diff --git a/src/cargo/ops/cargo_package.rs b/src/cargo/ops/cargo_package.rs
index a1b9a5f6..804ddb26 100644
--- a/src/cargo/ops/cargo_package.rs
+++ b/src/cargo/ops/cargo_package.rs
@@ -690,6 +690,7 @@ fn hash_all(path: &Path) -> CargoResult<HashMap<PathBuf, u64>> {
             let entry = entry?;
             let file_type = entry.file_type();
             if file_type.is_file() {
+                debug!("hashing {:?}", entry.path());
                 let contents = fs::read(entry.path())?;
                 let hash = util::hex::hash_u64(&contents);
                 result.insert(entry.path().to_path_buf(), hash);

I saw this in a real project, https://github.com/FauxFaux/ext4-rs , which extracts some sparse (i.e. tiny but apparently large) disc images from a tiny tar file during its tests. It publishes fine on e.g. 1.34.2, but not on stable.

Notes

Output of cargo version:

% cargo +stable --version                      
cargo 1.38.0 (23ef9a4ef 2019-08-20)

I'm on amd64 Ubuntu with ~30gb of memory available. If you have over 1TB of virtual memory available, then the above testcase might pass (lucky you).

The text was updated successfully, but these errors were encountered:

alexcrichton · 2019-10-28T15:10:41Z

Thanks for the detailed bug report!

I think yeah the right solution here is to just avoid reading everything in-memory and hashing it externally. It'll still probably take forever to read and hash a 1TB file though so processing exclude rules in a reasonable way would also be reasonable here!

hbina · 2020-02-26T23:19:07Z

Hi, since this have been marked easy I thought I could help with the issue. I have read through the description but I fail to come up with a solution without knowing some facts about the application.

Does cargo have any information on what files are being ignored by git or not? I find this unlikely, as there is no reason that it should. Additionally, any arbitrary fixes that checks the size of the file will backfire some time in the future.

So any ideas?

ehuss · 2020-02-27T15:35:43Z

@hbina Thanks for taking a look! I'm not sure I understand the question about git. Cargo does know which files are ignored. However, this code path is used to explicitly check if any file is modified (ignored or not).

The solution is to stop using fs::read inside hash_all, and instead use something that streams the file into the hasher. We already have this implemented for Sha256. We need it implemented for the hasher used here. If possible, it would be great to change the Sha256 implemented to be generic for the hasher type, and reuse the code.

hbina · 2020-02-27T16:51:02Z

I see, I will take a look again after school today.

Edit: Having skimmed through it a bit. It seems like SipHasher does not implement Writer so it can't be used like a stream. Do I just replace it with the Hasher in your linked? Except use Algorithm::MD5, which I assume is cheaper

But this seems like it will break other code.

Edit: Nevermind...

Added a helper function for util::hex::hash_64 that uses streams the content instead of reading through the entire content in one go.

Fixes issue #7543 Added a helper function for `util::hex::hash_64` that uses streams the content instead of reading through the entire content in one go. Edit: Should I add test cases for this? Edit2: Per #7543

Update cargo, clippy Closes #69601 ## cargo 16 commits in e57bd02999c9f40d52116e0beca7d1dccb0643de..bda50510d1daf6e9c53ad6ccf603da6e0fa8103f 2020-02-21 20:20:10 +0000 to 2020-03-02 18:05:34 +0000 - Fix rare failure in collision_export test. (rust-lang/cargo#7956) - Ignore broken Cargo.toml in git sources (rust-lang/cargo#7947) - Add more fingerprint mtime debug logging. (rust-lang/cargo#7952) - Fix plugin tests for latest nightly. (rust-lang/cargo#7955) - Simplified usage code of SipHasher (rust-lang/cargo#7945) - Add a special case for git config discovery inside tests (rust-lang/cargo#7944) - Fixes issue rust-lang/cargo#7543 (rust-lang/cargo#7946) - Filter out cfgs which should not be used during build (rust-lang/cargo#7943) - Provide extra context on a query failure. (rust-lang/cargo#7934) - Try to clarify `cargo metadata`'s relationship with the workspace. (rust-lang/cargo#7927) - Update libgit2 dependency (rust-lang/cargo#7939) - Fix link in comment (rust-lang/cargo#7936) - Enable `cargo doc --open` tests on macos. (rust-lang/cargo#7932) - build-std: remove sysroot probe (rust-lang/cargo#7931) - Try to clarify how feature flags work on the "current" package. (rust-lang/cargo#7928) - Add extra details in the new feature resolver doc comment. (rust-lang/cargo#7918) ## clippy 6 commits in fc5d0cc..8b7f7e6 2020-02-24 05:58:17 +0000 to 2020-03-02 20:00:31 +0000 - Rustup to #69469 (rust-lang/rust-clippy#5254) - Some rustups (rust-lang/rust-clippy#5247) - Update git2 to 0.12 (rust-lang/rust-clippy#5232) - Rustup to #61812 (rust-lang/rust-clippy#5231) - Add lint to improve floating-point expressions (rust-lang/rust-clippy#4897) - Do not run deploy action on other repos (rust-lang/rust-clippy#5222)

ehuss · 2020-03-04T22:17:13Z

This should be fixed by #7946.

jrcharney · 2020-07-25T15:00:26Z

I'm on amd64 Ubuntu with ~30gb of memory available. If you have over 1TB of virtual memory available, then the above testcase might pass (lucky you).

I can't believe this is the trend with software management. (snapd is much worse. Putting that snap directory in the home directory. This is why I stopped using Ubuntu. The Canonical guys don't know what hidden directories are. But that's for another place to complain about.)

It shouldn't take large amounts of memory to compile a simple program. Once modules are compiled, they should sit in a cache until all the pieces are ready to be put together. (I wanted to say "assembled" but I didn't want someone to think I was implying assembly language.)

It shouldn't have taken a $80 purchase at the local tech shop to figure out cargo's way of doing things is that it holds all the compiled pieces in RAM until things are built. I know this because I used a 2GB Pi 4 to try to build a cargo app from source and was getting kernel warnings out of left field, while on the 4GB Pi 4, that wasn't an issue though I was worried about things slowing down like they did on the 2GB device.

The point being: When cargo compiles component and sets them aside, they should be stored on the harddrive (where space is plentiful and theirs room for virtual memory if that's how cargo wants to do things), not the memory where the rest of the system still needs resources to function.

I really like the Rust programming language, and I am excited to try out some of its features when I write code for my own project in the future. But if rust relies on consuming large amounts of the computers more precious resources (especially the RAM), then Rust might not be the ideal programming language if its package manager is flawed.

(I know this issue has been closed for a few months now. I just wanted to get these thoughts off my chest considering the amount of effort I spent this week trying to understand why even something with minimal resources would have trouble handling an other wise simple task that an older computer with lesser capabilities could handle.)

OS: Manjaro Linux ARM 20.06
Kern: aarch64

FauxFaux added the C-bug Category: bug label Oct 27, 2019

alexcrichton closed this as completed Oct 28, 2019

alexcrichton reopened this Oct 28, 2019

alexcrichton added Command-package Command-publish labels Oct 28, 2019

ehuss added E-help-wanted E-easy Experience: Easy labels Feb 25, 2020

hbina added a commit to hbina/cargo that referenced this issue Feb 27, 2020

Fixes issue rust-lang#7543

bdec8c2

Added a helper function for util::hex::hash_64 that uses streams the content instead of reading through the entire content in one go.

hbina mentioned this issue Feb 27, 2020

Fixes cargo package buffering entire contents into memory #7946

Merged

ehuss mentioned this issue Mar 2, 2020

Update cargo, clippy rust-lang/rust#69627

Merged

ehuss closed this as completed Mar 4, 2020

chentodd mentioned this issue Oct 21, 2024

"memory allocation of .. bytes failed" appears when using cargo run test to run specific test name #14712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cargo publish "memory allocation of .. bytes failed" with large, ignored test assets #7543

cargo publish "memory allocation of .. bytes failed" with large, ignored test assets #7543

FauxFaux commented Oct 27, 2019

alexcrichton commented Oct 28, 2019

hbina commented Feb 26, 2020 •

edited

Loading

ehuss commented Feb 27, 2020

hbina commented Feb 27, 2020 •

edited

Loading

ehuss commented Mar 4, 2020

jrcharney commented Jul 25, 2020

cargo publish "memory allocation of .. bytes failed" with large, ignored test assets #7543

cargo publish "memory allocation of .. bytes failed" with large, ignored test assets #7543

Comments

FauxFaux commented Oct 27, 2019

alexcrichton commented Oct 28, 2019

hbina commented Feb 26, 2020 • edited Loading

ehuss commented Feb 27, 2020

hbina commented Feb 27, 2020 • edited Loading

ehuss commented Mar 4, 2020

jrcharney commented Jul 25, 2020

hbina commented Feb 26, 2020 •

edited

Loading

hbina commented Feb 27, 2020 •

edited

Loading