Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

931: binary cataloger exclusion defaults #1948

Merged
merged 18 commits into from
Aug 8, 2023
Merged

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Jul 20, 2023

Binary cataloger exclusion defaults

Fixes #931

PR #1948 introduces a new implicit exclusion for packages that overlap by file ownership and have certain characteristics:

// 1) the relationship between packages is OwnershipByFileOverlap
// 2) the parent is an "os" package
// 3) the child is a synthetic package generated by the binary cataloger
// 4) the package names are identical

Packages found by the following catalogers will dedupe synthetic binary packages given an overlap as described above:

apkdb,
alpm,
deb
nix,
rpm (file and db)

I've added an integration test that captures the new default where scanning an alpine image with busybox goes from:

alpine-baselayout       3.4.3-r1     apk
alpine-baselayout-data  3.4.3-r1     apk
alpine-keys             2.4-r1       apk
apk-tools               2.14.0-r2    apk
busybox                 1.36.1       binary
busybox                 1.36.1-r0    apk
busybox-binsh           1.36.1-r0    apk
ca-certificates-bundle  20230506-r0  apk
libc-utils              0.7.2-r5     apk
libcrypto3              3.1.1-r1     apk
libssl3                 3.1.1-r1     apk
musl                    1.2.4-r0     apk
musl-utils              1.2.4-r0     apk
scanelf                 1.3.7-r1     apk
ssl_client              1.36.1-r0    apk
zlib                    1.2.13-r1    apk

to

alpine-baselayout       3.4.3-r1     apk
alpine-baselayout-data  3.4.3-r1     apk
alpine-keys             2.4-r1       apk
apk-tools               2.14.0-r2    apk
busybox                 1.36.1-r0    apk
busybox-binsh           1.36.1-r0    apk
ca-certificates-bundle  20230506-r0  apk
libc-utils              0.7.2-r5     apk
libcrypto3              3.1.1-r1     apk
libssl3                 3.1.1-r1     apk
musl                    1.2.4-r0     apk
musl-utils              1.2.4-r0     apk
scanelf                 1.3.7-r1     apk
ssl_client              1.36.1-r0    apk
zlib                    1.2.13-r1    apk

@github-actions
Copy link

github-actions bot commented Jul 20, 2023

Benchmark Test Results

Benchmark results from the latest changes vs base branch
goos: linux%0Agoarch: amd64%0Apkg: github.com/anchore/syft/test/integration%0Acpu: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz%0A                                                              │ ./.tmp/benchmark-6cc1b86.txt │%0A                                                              │            sec/op            │%0AImagePackageCatalogers/alpmdb-cataloger-2                                       15.43m ±  8%25%0AImagePackageCatalogers/apkdb-cataloger-2                                        1.012m ±  6%25%0AImagePackageCatalogers/binary-cataloger-2                                       262.6µ ±  8%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                       771.9µ ±  8%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                   28.92µ ±  8%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                             128.6µ ±  5%25%0AImagePackageCatalogers/java-cataloger-2                                         16.19m ± 25%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                         134.2µ ±  5%25%0AImagePackageCatalogers/javascript-package-cataloger-2                           537.6µ ±  5%25%0AImagePackageCatalogers/nix-store-cataloger-2                                    401.9µ ±  3%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                       1.071m ±  7%25%0AImagePackageCatalogers/portage-cataloger-2                                      685.0µ ±  8%25%0AImagePackageCatalogers/python-package-cataloger-2                               4.479m ±  6%25%0AImagePackageCatalogers/r-package-cataloger-2                                    322.7µ ±  7%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                       738.0µ ±  3%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                 1.385m ±  7%25%0AImagePackageCatalogers/sbom-cataloger-2                                         159.3µ ±  2%25%0Ageomean                                                                         664.2µ%0A%0A                                                              │ ./.tmp/benchmark-6cc1b86.txt │%0A                                                              │             B/op             │%0AImagePackageCatalogers/alpmdb-cataloger-2                                       5.144Mi ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                        205.7Ki ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                       30.46Ki ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                       172.6Ki ± 0%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                   3.695Ki ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                             9.906Ki ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                         2.843Mi ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                         8.595Ki ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                           94.20Ki ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                    49.33Ki ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                       186.6Ki ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                      120.2Ki ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                               1.003Mi ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                    53.29Ki ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                       181.4Ki ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                 144.1Ki ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                         14.20Ki ± 0%25%0Ageomean                                                                         100.6Ki%0A%0A                                                              │ ./.tmp/benchmark-6cc1b86.txt │%0A                                                              │          allocs/op           │%0AImagePackageCatalogers/alpmdb-cataloger-2                                        88.14k ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                         4.190k ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                         848.0 ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                        3.145k ± 0%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                     132.0 ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                               281.0 ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                          40.19k ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                           228.0 ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                            1.342k ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                      898.0 ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                        4.080k ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                       2.272k ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                                16.45k ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                      929.0 ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                        3.992k ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                  2.447k ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                           394.0 ± 0%25%0Ageomean                                                                          2.062k

@spiffcs spiffcs changed the title 931: binary cataloger defaults 931: binary cataloger exclusion defaults Jul 20, 2023
@spiffcs spiffcs self-assigned this Jul 27, 2023
@spiffcs
Copy link
Contributor Author

spiffcs commented Jul 31, 2023

Quick update on this PR - after some discussion we're going to dial back the feature to not be exposed to the user yet and just fix this for the narrower cases of name --> name overlap in the case of os --> binary(synthetic package) matches

spiffcs added 2 commits August 7, 2023 15:25
Signed-off-by: Christopher Phillips <[email protected]>
@spiffcs spiffcs force-pushed the 931-binary-cataloger-defaults branch from ace0cb0 to d45458e Compare August 7, 2023 20:21
@spiffcs spiffcs marked this pull request as ready for review August 7, 2023 22:01
syft/lib.go Show resolved Hide resolved
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers, but left some feedback.

syft/pkg/cataloger/package_exclusions.go Outdated Show resolved Hide resolved
syft/pkg/cataloger/package_exclusions.go Outdated Show resolved Hide resolved
syft/pkg/cataloger/package_exclusions.go Outdated Show resolved Hide resolved
Comment on lines 15 to 22
type CategoryType string

const (
OsCatalogerType CategoryType = "os"
BinaryCatalogerType CategoryType = "binary"
)

var CatalogerTypeIndex = map[CategoryType][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these just be simplified into 2 variables? something like:

var parentCatalogerTypes = []string { .... }
var childCatalogerTypes = []string { .... }

Copy link
Contributor Author

@spiffcs spiffcs Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Yea that would be a good simplification here.

My only hesitancy to change it back to that is the original config object we had discussed on the issue:
#931 (comment)

I think keeping this as is has two advantages:

  1. Is clear to future users/contributors that Os/Binary categorization types were an explicit choice as and additional condition. The parent child designation loses this nuance a little.
  2. It keeps us open to category based configuration options we may want to consider in the future

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the concern is that we want to be explicit about OS and binary cataloger types, these could be named

var osCatalogerTypes = []string { .... }
var binaryCatalogerTypes = []string { .... }

keeps us open to category based configuration options we may want to consider in the future

I'm all for forward-thinking such as being open to more configuration. The suggestion was more that since we're not doing that at the moment, we don't necessarily know what that would look like (although you had an option originally), so it might be better to just make whatever changes at such time as we do change the feature. Again, this is not a blocker and I'll leave it to your discernment.

syft/pkg/cataloger/package_exclusions.go Outdated Show resolved Hide resolved
test/integration/package_ownership_relationship_test.go Outdated Show resolved Hide resolved
@kzantow
Copy link
Contributor

kzantow commented Aug 8, 2023

One more question I forgot: should this PR include a boolean config option to revert this behavior?

@spiffcs
Copy link
Contributor Author

spiffcs commented Aug 8, 2023

One more question I forgot: should this PR include a boolean config option to revert this behavior?

Yea good call - this should now be added with 58f6d69

I've opted for the new behavior of exclusions to be the default since we've identified the synthetic binary packages in some cases to be a mistake. Users can add the following to their configs to reenable the old flow:

exclude-binary-overlap-by-ownership: false

Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the comment about the change to encode_decode_cycle_test, I think this might be a blocker (and an accidental commit?)

syft/pkg/cataloger/rpm/cataloger.go Show resolved Hide resolved
syft/pkg/cataloger/package_exclusions.go Outdated Show resolved Hide resolved
Comment on lines 15 to 22
type CategoryType string

const (
OsCatalogerType CategoryType = "os"
BinaryCatalogerType CategoryType = "binary"
)

var CatalogerTypeIndex = map[CategoryType][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the concern is that we want to be explicit about OS and binary cataloger types, these could be named

var osCatalogerTypes = []string { .... }
var binaryCatalogerTypes = []string { .... }

keeps us open to category based configuration options we may want to consider in the future

I'm all for forward-thinking such as being open to more configuration. The suggestion was more that since we're not doing that at the moment, we don't necessarily know what that would look like (although you had an option originally), so it might be better to just make whatever changes at such time as we do change the feature. Again, this is not a blocker and I'll leave it to your discernment.

syft/pkg/cataloger/package_exclusions.go Outdated Show resolved Hide resolved
test/integration/encode_decode_cycle_test.go Outdated Show resolved Hide resolved
Signed-off-by: Christopher Phillips <[email protected]>
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 -- and definitely agree that having the default behavior as you noted to exclude these entries in the SBOM

@spiffcs spiffcs merged commit 466da7c into main Aug 8, 2023
@spiffcs spiffcs deleted the 931-binary-cataloger-defaults branch August 8, 2023 17:00
@@ -482,6 +482,10 @@ default-image-pull-source: ""
# - "./out/**/*.json"
exclude: []

# allows users to exclude synthetic binary packages from the sbom
# these packages are removed if an overlap with a non-synthetic package is found
exclude-overlap-by-ownership: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism int
}

func DefaultConfig() Config {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete the DefaultConfig method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was only used as a part of *_test.go files. It was moved here:

func defaultConfig() cataloger.Config {
return cataloger.Config{
Search: cataloger.DefaultSearchConfig(),
Parallelism: 1,
LinuxKernel: kernel.DefaultLinuxCatalogerConfig(),
Python: python.DefaultCatalogerConfig(),
ExcludeBinaryOverlapByOwnership: true,
}
}

Apologies for the boy scout change on an unrelated PR - my IDE was yelling about this being deadcode and I could not figure out why - the refactor over to test resolved that issue

// 3) the child is a synthetic package generated by the binary cataloger
// 4) the package names are identical
// This exclude was implemented as a way to help resolve: https://github.com/anchore/syft/issues/931
func Exclude(r artifact.Relationship, c *pkg.Collection) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems very specific, but has a very generic name. I think the name should probably be tweaked to be a little more specific.

)

var (
osCatalogerTypes = []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the filtering should be based on the package type, not the cataloger names.

@spiffcs spiffcs mentioned this pull request Aug 9, 2023
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
…chore#1948)

Fixes anchore#931

PR anchore#1948 introduces a new implicit exclusion for binary packages that overlap by file ownership and have certain characteristics:

1) the relationship between packages is OwnershipByFileOverlap
2) the parent package is an "os" package - see changelog for included catalogers
3) the child is a synthetic package generated by the binary cataloger - see changelog for included catalogers
4) the package names are identical

---------

Signed-off-by: Christopher Phillips <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Package duplicated by different cataloger
3 participants