Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for processing files with prepended bytes before the zip archive #428

Merged
merged 4 commits into from
Jun 4, 2021

Conversation

wagoodman
Copy link
Contributor

Today syft unzips java jars and catalogs all contents, however, jars (which are typically zip files) with shell scripts prepended to the archive are not supported by the stdlib. This PR adds support for processing the files as described in the above golang issue + #413 (comment) .

We could infer the archive length from the end of central directory record by reading from the end of the file, create a new io.ReaderAt via io.SectionReader that is bounded to only the archive based on the length from the ECOD, and create a new zip reader via zip.NewReader() (instead of zip.OpenReader). We'd need to make certain that the underlying reader passed to the SectionReader is closed after each use since there is no Close() operation available from a zip.Reader.

Specifically, this PR adds a new ZipReadCloser that is a drop-in replacement for the stdlib zip.ReadCloser, but adds support for the above mentioned case (ignoring prepended bytes).

This additionally updates the TraverseFilesInZip util function to only use the new ZipReadCloser, and in doing so all of the helper functions that use TraverseFilesInZip also no longer use zip.ReadCloser.

The Springboot test-fixture has been updated to generate the self-executing tar which exhibits this behavior (which should be the same kind of jar as described in #413 ).

Closes #413

@wagoodman wagoodman requested a review from a team June 3, 2021 02:09
@wagoodman wagoodman self-assigned this Jun 3, 2021
@github-actions
Copy link

github-actions bot commented Jun 3, 2021

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                   old time/op    new time/op    delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           859µs ± 1%     753µs ± 1%  -12.34%  (p=0.008 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2        1.15ms ± 1%    1.02ms ± 0%  -11.65%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     429µs ± 1%     379µs ± 0%  -11.76%  (p=0.016 n=5+4)
ImagePackageCatalogers/dpkgdb-cataloger-2                 409µs ± 1%     359µs ± 0%  -12.09%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  426µs ± 1%     378µs ± 1%  -11.28%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  5.82ms ± 2%    5.09ms ± 1%  -12.40%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  628µs ± 1%     549µs ± 1%  -12.57%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-cataloger-2                     214µs ± 1%     188µs ± 1%  -12.09%  (p=0.008 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   343µs ± 1%     302µs ± 1%  -11.84%  (p=0.008 n=5+5)

name                                                   old alloc/op   new alloc/op   delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2          97.6kB ± 0%    97.4kB ± 0%     ~     (p=0.087 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2         579kB ± 0%     579kB ± 0%     ~     (p=0.310 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     112kB ± 0%     112kB ± 0%     ~     (p=0.548 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 115kB ± 0%     115kB ± 0%     ~     (p=0.841 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  134kB ± 0%     134kB ± 0%     ~     (p=0.413 n=4+5)
ImagePackageCatalogers/java-cataloger-2                  1.78MB ± 0%    1.79MB ± 0%   +0.62%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                 1.14MB ± 0%    1.14MB ± 0%     ~     (p=0.984 n=5+5)
ImagePackageCatalogers/go-cataloger-2                    48.3kB ± 0%    48.3kB ± 0%     ~     (p=0.889 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                  88.9kB ± 0%    89.0kB ± 0%   +0.01%  (p=0.016 n=4+5)

name                                                   old allocs/op  new allocs/op  delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           1.96k ± 0%     1.96k ± 0%     ~     (all equal)
ImagePackageCatalogers/python-package-cataloger-2         5.89k ± 0%     5.89k ± 0%     ~     (all equal)
ImagePackageCatalogers/javascript-package-cataloger-2     1.93k ± 0%     1.93k ± 0%     ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                 2.37k ± 0%     2.37k ± 0%     ~     (all equal)
ImagePackageCatalogers/rpmdb-cataloger-2                  3.19k ± 0%     3.19k ± 0%     ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                   22.3k ± 0%     22.3k ± 0%   +0.13%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  1.85k ± 0%     1.85k ± 0%     ~     (all equal)
ImagePackageCatalogers/go-cataloger-2                     1.40k ± 0%     1.40k ± 0%     ~     (all equal)
ImagePackageCatalogers/rust-cataloger-2                   2.75k ± 0%     2.75k ± 0%     ~     (all equal)

internal/file/zip_file_helpers_test.go Show resolved Hide resolved
internal/file/zip_file_manifest.go Outdated Show resolved Hide resolved
}

// OpenZip provides a ZipReadCloser for the given filepath.
func OpenZip(filepath string) (*ZipReadCloser, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can OpenZip be unexported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can, but we should be opening all zips with this util now (including future uses of zip.OpenReader should use this instead). It happens to be that no other function internally yet.


t.Logf("running from: %s", cwd)
// create a temp file
tmpFile, err := ioutil.TempFile("", "syft-ziputil-archive-TEST-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"syft-ziputil-archive-TEST-" — This string might be worth extracting as a constant, partially for consistency, partially for developer ease as we add tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be roughly unique for each test or helper, I'll adjust the name.

internal/file/zip_file_traversal_test.go Outdated Show resolved Hide resolved
internal/file/zip_file_traversal_test.go Outdated Show resolved Hide resolved
internal/file/zip_file_traversal_test.go Outdated Show resolved Hide resolved
internal/file/zip_file_traversal_test.go Outdated Show resolved Hide resolved
return startOfArchive, nil
}

func findSignatureInBlock(b []byte) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this func also derived from the zip package? it looks terrifying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is (from the top of the file):

// directoryEndLen, readByf, directoryEnd, and findSignatureInBlock were copied from the golang stdlib, specifically:
// - https://github.com/golang/go/blob/go1.16.4/src/archive/zip/struct.go
// - https://github.com/golang/go/blob/go1.16.4/src/archive/zip/reader.go
// findArchiveStartOffset is derived from the same stdlib utils, specifically the readDirectoryEnd function.

@wagoodman wagoodman force-pushed the self-extracting-jar branch from 4047760 to 452e55b Compare June 4, 2021 16:11
@wagoodman wagoodman force-pushed the self-extracting-jar branch from 452e55b to 6a398f9 Compare June 4, 2021 16:11
@wagoodman wagoodman merged commit 801e662 into main Jun 4, 2021
@wagoodman wagoodman deleted the self-extracting-jar branch June 4, 2021 16:50
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
Add support for processing files with prepended bytes before the zip archive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve discovery of app.jar packaging (self-executing Jars)
3 participants