Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support uncompressed/reconstructed payloads #861

Closed
wgwoods opened this issue Sep 24, 2019 · 10 comments
Closed

Support uncompressed/reconstructed payloads #861

wgwoods opened this issue Sep 24, 2019 · 10 comments

Comments

@wgwoods
Copy link

wgwoods commented Sep 24, 2019

#163 / commit 91aa078 added RPMTAG_PAYLOADDIGEST and RPMTAG_PAYLOADDIGESTALGO, so RPM now verifies the integrity of the payload. But there are tools (e.g. deltarpm) that reconstruct RPM payloads from individual parts. Given an RPM header and the individual file contents, the original (uncompressed) payload can be easily reconstructed by adding the appropriate CPIO headers, but there's no way to verify the integrity of the reconstructed payload other than re-compressing it and letting RPM verify PAYLOADDIGEST, which wastes a bunch of CPU & disk i/o and then sometimes fails randomly because of minor, unpredictable differences in compressor output.

To fix this I propose adding a second digest (RPMTAG_PAYLOADDIGEST_UNCOMPRESSED?) for the uncompressed payload, and then either:

  1. Fall back to uncompressing the payload and checking the uncompressed digest if the original verification fails (unsafe, slow)
  2. Add another tag (maybe SIGTAG_PAYLOAD_UNCOMPRESSED?) which directs RPM to assume the payload is already uncompressed; external programs could manually set that flag when reconstructing an RPM, or
  3. Add a new tag (RPMTAG_PAYLOAD_MAGIC?) that gives magic bytes (e.g. the first 4 bytes) for the compressed and uncompressed payload, so RPM can identify uncompressed/reconstructed payloads.

Either way, RPM would also need to override/ignore RPMTAG_PAYLOADCOMPRESSOR when the "uncompressed payload" flag is set. But that only happens in 3 places that I can see, so that's doable.

@pmatilai
Copy link
Member

Rpm supports uncompressed payload as it is (see commit 81acadb and 9666e8c), but that doesn't help the deltarpm case as this is a build-time thing.

I did consider adding a separate digest for the uncompressed data back when adding RPMTAG_PAYLOADDIGEST (there are various uses for the data), but to sanely do so would require teaching rpmio to calculate digests both pre- and/or post-(de)compression per request.

Somehow permitting the payload bits to differ from what was created at build-time would be a pretty significant change for rpm, but I agree there's a point in it: ultimately it's the authenticity of delivered bits that matter, not the delivery box (of compression). I'll need to think about this.

@pmatilai
Copy link
Member

One possible idea that doesn't require any additional package tags (which present their own problems) would be always calculating both pre- and post-decompression digest, and choose the tag to verify against based on whether they're equal or not. This would require payload open falling back to uncompressed mode if opening in compressed fails (might even already happen, dunno).
There might be some challenges fitting that into the verification system though.

@mlschroe
Copy link
Contributor

If the reconstructed rpm is getting installed right away we can also use add an option that tells rpm that the payload is uncompressed. Regarding the magic, we already have to cpio magic to identify uncompressed payload.

I've also considered adding something like this. deltarpm would also need to be changed to strip away all header+payload digests/signatures from the signature header.

@pmatilai
Copy link
Member

pmatilai commented Sep 25, 2019

deltarpm would also need to be changed to strip away all header+payload digests/signatures from the signature header.

On that note... the grand plan is to drop header+payload digests/signatures from rpm, except as a legacy compatibility option. Adding a strong (and signed if package is signed) digest for the payload alone was a pre-requisite for that, and it's also the reason the verify code is written the way it is: for packages built on rpm >= 4.14 you can already disable header+payload digests/signatures without sacrificing security at all. And in this setting, whether the payload is compressed or not is ultimately totally uninteresting.

@pmatilai
Copy link
Member

On yet another related note, there a case or two for having a global hash for the individual files, with no payload induced foobar in it. One case is having a true "source id" calculated from all the files in an src.rpm, that would be also planted on any binaries built from these sources. This differs from the existing sourcepkgid (which is build specific) in that it would be arch-independent for properly written specs. This could be done as a hash of the existing hashes which is far cheaper than a hash of the actual contents, but then it'd be dependent on build-configuration which makes it less universal than a hardwired hash of algorithm X.

Nothing says such a thing would have to be limited to source rpms, which is where we get back to the deltarpm case: I have no clue about deltarpm's inner workings, would such a thing be helpful/harmful/moo to the case?

@mlschroe
Copy link
Contributor

I don't think deltarpm would be able to make use of this global source file hash.

@wgwoods
Copy link
Author

wgwoods commented Sep 28, 2019

(The source hash is an interesting and not-unrelated problem, but there's some other stuff to figure out before that can get done correctly - like how to make a sufficiently-unique but also human-meaningful identifier for the environment the package was built in, or the flags used to build it, etc..)

Somehow permitting the payload bits to differ from what was created at build-time would be a pretty significant change for rpm, but I agree there's a point in it: ultimately it's the authenticity of delivered bits that matter, not the delivery box (of compression).

Yeah, that's it exactly - as a general principle, the digest/signature should cover some canonical form of the data. That way other people can transform/store/etc. the data however they want, as long as there's a way to reliably transform it back to the canonical form it can always be verified as authentic.

In other words: If you sign uncompressed data, it's possible to verify any compressed data, because every compression algorithm guarantees that it can recover the original data

Similarly: if you sign the data with a well-known ordering - like, say, signing a hash of the sorted list of individual file digests - then you can verify the data regardless of what order you examine the individual parts. (Which is kind of like the source file hash you're talking about, but connecting source and binaries is tricky, so let's not get into that yet...)

As an optimization, it definitely makes sense to include a digest (or multiple digests!) of uncompressed and compressed/archived data, so you can more quickly verify the integrity of common payload formats. (Solaris IPS does something like this - they have "hash" and "chash" keys/tags in their package metadata, so you can check the chash if you have compressed data, or uncompress it and you can check hash..)

Anyway, my interest here is more about compatibility between RPM and external tools, especially new repo/package formats. Generally: if I have some data store that contains (or can rebuild) a complete, intact RPM header, and I have all the files listed in that RPM header - and all their digests match, so I know I have the right files - what's the easiest way for me to get RPM to install/update/verify that package header and those files?

It turns out that given just the RPM header and the file contents you can reconstruct the original RPM lead and uncompressed payload of nearly all modern RPMs - all the data in the lead and CPIO headers is also in the RPM headers, and the payload file ordering almost always match the header order. But I still can't convince RPM that I have authentic data, because I'm using a slightly different version of xz, or I didn't store the original timestamp, or the payload file order sometimes doesn't match the RPM header file ordering, or whatever.

So: any solution that means that I can get RPM to accept that the uncompressed files are, in fact, the same as the compressed ones, would be pretty great.

pmatilai added a commit to pmatilai/rpm that referenced this issue Oct 2, 2019
First piece of the puzzle to better support reconstructed payloads (rpm-software-management#861)
pmatilai added a commit to pmatilai/rpm that referenced this issue Oct 2, 2019
First piece of the puzzle to better support reconstructed payloads (rpm-software-management#861)
@pmatilai
Copy link
Member

pmatilai commented Oct 2, 2019

PR #880 is the first piece of the puzzle, was easier (and different) from what I originally thought.

pmatilai added a commit to pmatilai/rpm that referenced this issue Oct 3, 2019
First piece of the puzzle to better support reconstructed payloads (rpm-software-management#861)
pmatilai added a commit to pmatilai/rpm that referenced this issue Oct 3, 2019
First piece of the puzzle to better support reconstructed payloads (rpm-software-management#861)
@pmatilai
Copy link
Member

pmatilai commented Oct 3, 2019

Updated #880 to include actual verify support for the alt tag too, so it's now the first two pieces of the puzzle.

@pmatilai
Copy link
Member

So I think this is actually done to the extent that it's in scope of this ticket. Phasing out the problematic header+payload signatures and digests is a separate thing that is on-going and better tracked in #1292
Thanks for the initiative here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants