Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade generic cataloger #1281

Merged
merged 3 commits into from
Oct 24, 2022
Merged

Upgrade generic cataloger #1281

merged 3 commits into from
Oct 24, 2022

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented Oct 21, 2022

This PR introduces a second generic.Cataloger that is meant to eventually replace the common.GenericCataloger that exists today. Along with the new cataloger, this PR ports the alpm cataloger to use the new generic cataloger to illustrate intended usage.

Why make a new generic cataloger at all?

While working on #1063 and #572 it became apparent that catalogers should attempt to emit packages that should be immutable in order to simplify assumptions about the underlying objects for relationship objects. In the end package immutability won't be strictly followed, but the goal is to do this for all fields that may affect the ID of the package.

A very common limitation of the generic cataloger today is parser functions not having access to the source.FileResolver that the cataloger has. This comes up in the DPKG cataloger, the javascript cataloger, and others, specifically when the file being parsed needs to parse other files to complete the definition of the package. Allowing access to the source.FileResolver from the parser function solves this case for most (all?) catalogers.

Why put this in a new generic cataloger package instead of the current common package?

This would make it consistent with refinement notes from #558 . The goal would be to remove the common package altogether.

What differences are there for the alpm cataloger (and future catalogers)?

  • migrating pURL processing back to the cataloger
  • migrate CPE generation assignment to occur within the cataloger this will be done as a separate effort
  • the parser function itself decides whether to accept the location being parsed as evidence of the packages being emitted from the parser function.
  • removing pURL-specific behavior from any pkg.*Metadata structs

Why remove the FoundBy attribute from the definition of a package identifier (hash:ignore)?

The ID is meant to represent what makes up a package as well as the context to where it was found. "Who" found the package does not qualify within this criteria. Additionally, as we start moving back the line where a package elements are assigned by the cataloger and identifying elements should never be assigned after the function that creates the package it would be wrong to assign the FoundBy within the generic cataloger. These two things hint at removing FoundBy from the consideration of the package ID.

@github-actions
Copy link

github-actions bot commented Oct 21, 2022

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                       old time/op    new time/op    delta
ImagePackageCatalogers/alpmdb-cataloger-2                    13.7ms ±17%    15.8ms ± 7%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2              1.47ms ± 2%    1.80ms ± 7%  +22.69%  (p=0.008 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2            3.66ms ± 2%    4.50ms ± 3%  +23.14%  (p=0.008 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2    1.23ms ± 2%    1.44ms ± 3%  +16.86%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         867µs ± 1%    1006µs ± 4%  +16.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                    1.01ms ± 1%    1.18ms ± 2%  +16.22%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                    1.47ms ± 2%    1.71ms ± 4%  +16.29%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                      16.7ms ± 3%    20.7ms ± 2%  +23.93%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                     1.40ms ± 1%    1.70ms ± 4%  +21.06%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2          2.66µs ± 2%    2.98µs ± 2%  +12.32%  (p=0.008 n=5+5)
ImagePackageCatalogers/dotnet-deps-cataloger-2               1.51ms ± 1%    1.87ms ± 2%  +23.69%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    791µs ± 3%     931µs ± 5%  +17.69%  (p=0.008 n=5+5)

name                                                       old alloc/op   new alloc/op   delta
ImagePackageCatalogers/alpmdb-cataloger-2                    5.26MB ± 0%    5.26MB ± 0%     ~     (p=0.841 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2               202kB ± 0%     202kB ± 0%     ~     (p=0.310 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2             945kB ± 0%     945kB ± 0%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2     214kB ± 0%     214kB ± 0%   -0.20%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         158kB ± 0%     158kB ± 0%   -0.10%  (p=0.016 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                     203kB ± 0%     203kB ± 0%   -0.16%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                     302kB ± 0%     301kB ± 0%   -0.12%  (p=0.016 n=5+5)
ImagePackageCatalogers/java-cataloger-2                      3.44MB ± 0%    3.44MB ± 0%     ~     (p=0.310 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                     1.25MB ± 0%    1.25MB ± 0%     ~     (p=0.222 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2            672B ± 0%      672B ± 0%     ~     (all equal)
ImagePackageCatalogers/dotnet-deps-cataloger-2                369kB ± 0%     368kB ± 0%   -0.46%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    136kB ± 0%     136kB ± 0%   -0.09%  (p=0.008 n=5+5)

name                                                       old allocs/op  new allocs/op  delta
ImagePackageCatalogers/alpmdb-cataloger-2                     85.7k ± 0%     85.7k ± 0%   +0.01%  (p=0.008 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2               4.25k ± 0%     4.24k ± 0%   -0.46%  (p=0.000 n=5+4)
ImagePackageCatalogers/python-package-cataloger-2             16.6k ± 0%     16.5k ± 0%   -0.24%  (p=0.008 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2     5.53k ± 0%     5.50k ± 0%   -0.56%  (p=0.000 n=5+4)
ImagePackageCatalogers/javascript-package-cataloger-2         3.32k ± 0%     3.31k ± 0%   -0.32%  (p=0.000 n=4+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                     4.60k ± 0%     4.57k ± 0%   -0.65%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                     8.13k ± 0%     8.11k ± 0%   -0.25%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                       57.5k ± 0%     57.5k ± 0%   -0.08%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                      5.43k ± 0%     5.41k ± 0%   -0.37%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2            15.0 ± 0%      15.0 ± 0%     ~     (all equal)
ImagePackageCatalogers/dotnet-deps-cataloger-2                7.27k ± 0%     7.15k ± 0%   -1.65%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    3.59k ± 0%     3.58k ± 0%   -0.28%  (p=0.029 n=4+4)

@wagoodman wagoodman marked this pull request as ready for review October 24, 2022 13:56
@wagoodman wagoodman requested a review from a team October 24, 2022 13:56
@wagoodman wagoodman merged commit b44f441 into main Oct 24, 2022
@wagoodman wagoodman deleted the upgrade-generic-cataloger branch October 24, 2022 15:12
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* add second generation of generic cataloger

Signed-off-by: Alex Goodman <[email protected]>

* upgrade aplm cataloger to use generic.Cataloger

Signed-off-by: Alex Goodman <[email protected]>

* remove pacakge found-by attribute from the definition of a package ID

Signed-off-by: Alex Goodman <[email protected]>

Signed-off-by: Alex Goodman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants