-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hash-checking support to install
and sync
#2945
Conversation
4a4838d
to
706e330
Compare
b626db8
to
525ed8a
Compare
706e330
to
20f3aeb
Compare
4fd10e2
to
1d8c657
Compare
}, | ||
Self::NoBinary => false, | ||
Self::MismatchedHash => false, | ||
Self::MissingHash => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zanieb - I could use your help with this part. I'm not sure if I did the comparisons correctly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually wrong, although I'm not sure if it matters. You're supposed to be enforcing an ordering but here you're saying that MissingHash
and MismatchedHash
are never more compatible than the other value. This means that if we see both MissingHash
and MismatchedHash
we would arbitrarily display the first one we saw instead of preferring to present one of them to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to change up the strategy in #2949, such that we treat distributions without hashes as compatible (but lower-priority). I can merge that into this PR if you agree with the change.
other => other, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BurntSushi - Do you mind reviewing this component? The basic idea is a wrapper around any reader that can compute a set of hashes as we go. It's then used to wrap the async streams as we unzip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye. I think I have just one concern but otherwise this looks pretty reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tyvm!
6ee405c
to
4d5fbe0
Compare
f3edd39
to
60019b0
Compare
Open to advice on how best to test that if a registry tells us the wrong hashes, we still validate them at install time. I need a registry or |
7ba9862
to
b1e9f80
Compare
A few TODOs:
I'd also like to refactor the dataflow from requirements file to |
7f970bf
to
f66887c
Compare
); | ||
|
||
// Third, request the correct hash, that the registry _thinks_ is correct, but without the | ||
// cache. We _should_ accept it, but we currently don't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a TODO. We reject distributions without hashes when --require-hashes
is provided, whereas in reality we should compute them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I solved this in #2949.
32a2088
to
d8f5d37
Compare
Not exactly this PR (more #2909), but do we have an explanation for .http/.rev files in the docstrings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
@@ -93,6 +93,7 @@ indoc = { version = "2.0.4" } | |||
itertools = { version = "0.12.1" } | |||
junction = { version = "1.0.0" } | |||
mailparse = { version = "0.14.0" } | |||
md-5 = { version = "0.10.6" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support md5? It's cryptographically broken
/// The path to the archive entry in the wheel's archive bucket. | ||
pub path: PathBuf, | ||
/// The computed hashes of the archive. | ||
pub hashes: Vec<HashDigest>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we have a constructor and getters
/// The path to the archive entry in the wheel's archive bucket. | |
pub path: PathBuf, | |
/// The computed hashes of the archive. | |
pub hashes: Vec<HashDigest>, | |
/// The path to the archive entry in the wheel's archive bucket. | |
path: PathBuf, | |
/// The computed hashes of the archive. | |
hashes: Vec<HashDigest>, |
}; | ||
let mut hashers = algorithms.into_iter().map(Hasher::from).collect::<Vec<_>>(); | ||
let mut hasher = uv_extract::hash::HashReader::new(file, &mut hashers); | ||
uv_extract::stream::unzip(&mut hasher, temp_dir.path()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we spawn blocking here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's async whereas the other version is sync. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
crates/uv-extract/src/hash.rs
Outdated
|
||
/// Exhaust the underlying reader. | ||
pub async fn finish(&mut self) -> Result<(), std::io::Error> { | ||
while self.read(&mut [0; 8192]).await? > 0 {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially putting 8KB on the stack. Maybe not an issue, but it's big enough to stand out to me.
I think I'd probably just use vec![0; 8192]
here instead. You could put it in a thread_local!
to amortize the alloc if you're concerned about perf.
other => other, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye. I think I have just one concern but otherwise this looks pretty reasonable to me.
## Summary This lets us remove circular dependencies (in the future, e.g., #2945) that arise from `FlatIndex` needing a bunch of resolver-specific abstractions (like incompatibilities, required hashes, etc.) that aren't necessary to _fetch_ the flat index entries.
Add basic hash More notes Looking at cache...
d8f5d37
to
669384d
Compare
669384d
to
e3f5242
Compare
Summary
This PR adds support for hash-checking mode in
pip install
andpip sync
. It's a large change, both in terms of the size of the diff and the modifications in behavior, but it's also one that's hard to merge in pieces (at least, with any test coverage) since it needs to work end-to-end to be useful and testable.Here are some of the most important highlights:
archives
directory, we now store pointers with a set of known hashes. So every pointer to an unzipped wheel also includes its known hashes.--require-hashes
, and the cache doesn't contain those hashes, we invalidate the cache, redownload the wheel, and compute the hashes as we go. For users that don't run with--require-hashes
, there will be no change in performance. For users that do, the only change will be if they don't run with--generate-hashes
-- then they may see some repeated work between resolution and installation, if they usepip compile
thenpip sync
.hashes
field, likeCachedDist
andLocalWheel
.--require-hashes
is provided, we require that all distributions are pinned with either==
or a direct URL. We also require that all distributions have hashes.There are a few notable TODOs:
pip compile
never outputs unnamed requirements. I can fix this, it's just some additional work.--require-hashes
with a hash exists in the requirements file. We require--require-hashes
.Closes #474.
Test Plan
I'd like to add some tests for registries that report incorrect hashes, but otherwise:
cargo test