Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux Kernel cataloger #1694

Merged
merged 22 commits into from
Apr 14, 2023
Merged

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Mar 24, 2023

This adds a cataloger for kernel files. It looks for files in a particular set of filenames, then uses libmagic (well, a library in go that reproduces part of it, with more to come) to parse metadata.

Includes options to append additional file globs to parse.

TODO:

  • Way too many CPEs, not sure how to fix those, as I do not know where they are generated
  • kernel modules, which are different file names and require different handling
  • the extended version vs the basic version, when and how to use which
  • listing files as part of the package vs just having the package vs just having files (answer: files opened during cataloging are in pkg.Package.Location, otherwise owned files are listed in the pkg.Package.Metadata)
  • handling duplicates, e.g. we find the kernel, but the kernel also is part of an install package (answer: there is a file-ownership-overlap relationship created already by syft in these cases)

Closes #1378

@deitch deitch force-pushed the kernel-and-modules-cataloger branch from 1520164 to 8127d42 Compare March 24, 2023 07:56
@deitch
Copy link
Contributor Author

deitch commented Mar 24, 2023

Some sample output from a directory with a kernel file:

                                                                                                                                                                {
 "spdxVersion": "SPDX-2.3",
 "dataLicense": "CC0-1.0",
 "SPDXID": "SPDXRef-DOCUMENT",
 "name": "/home/ubuntu/eve/rootfs/boot/",
 "documentNamespace": "https://anchore.com/syft/dir/home/ubuntu/eve/rootfs/boot-b6c3a42e-a58e-4d40-baf6-0629dfea8599",                                           "creationInfo": {
  "licenseListVersion": "3.20",
  "creators": [
   "Organization: Anchore, Inc",
   "Tool: syft-[not provided]"
  ],                                                                                                                                                              "created": "2023-03-24T07:50:58Z"
 },
 "packages": [
  {
   "name": "linux-kernel",
   "SPDXID": "SPDXRef-Package-linux-kernel-linux-kernel-cbf2ab66dde87d2d",
   "versionInfo": "5.10.121-linuxkit (root@buildkitsandbox) #1 SMP Fri Dec 2 10:35:42 UTC 2022",
   "downloadLocation": "NOASSERTION",
   "sourceInfo": "acquired package info from the following paths: ",
   "licenseConcluded": "NONE",
   "licenseDeclared": "NONE",
   "copyrightText": "NOASSERTION",
   "externalRefs": [
    {
     "referenceCategory": "SECURITY",
     "referenceType": "cpe23Type",
     "referenceLocator": "cpe:2.3:a:linux-kernel:linux-kernel:5.10.121-linuxkit_\\(root\\@buildkitsandbox\\)_\\#1_SMP_Fri_Dec_2_10\\:35\\:42_UTC_2022:*:*:*:*:*:
*:*"
    },
    {
     "referenceCategory": "SECURITY",
     "referenceType": "cpe23Type",
     "referenceLocator": "cpe:2.3:a:linux-kernel:linux_kernel:5.10.121-linuxkit_\\(root\\@buildkitsandbox\\)_\\#1_SMP_Fri_Dec_2_10\\:35\\:42_UTC_2022:*:*:*:*:*:
*:*"
    },
    {
     "referenceCategory": "SECURITY",
     "referenceType": "cpe23Type",
     "referenceLocator": "cpe:2.3:a:linux_kernel:linux-kernel:5.10.121-linuxkit_\\(root\\@buildkitsandbox\\)_\\#1_SMP_Fri_Dec_2_10\\:35\\:42_UTC_2022:*:*:*:*:*:
*:*"
    },
    {
     "referenceCategory": "SECURITY",
     "referenceType": "cpe23Type",
     "referenceLocator": "cpe:2.3:a:linux_kernel:linux_kernel:5.10.121-linuxkit_\\(root\\@buildkitsandbox\\)_\\#1_SMP_Fri_Dec_2_10\\:35\\:42_UTC_2022:*:*:*:*:*:
*:*"
    },
    {
     "referenceCategory": "SECURITY",
     "referenceType": "cpe23Type",
     "referenceLocator": "cpe:2.3:a:linux:linux-kernel:5.10.121-linuxkit_\\(root\\@buildkitsandbox\\)_\\#1_SMP_Fri_Dec_2_10\\:35\\:42_UTC_2022:*:*:*:*:*:*:*"
    },
    {
     "referenceCategory": "SECURITY",
     "referenceType": "cpe23Type",
     "referenceLocator": "cpe:2.3:a:linux:linux_kernel:5.10.121-linuxkit_\\(root\\@buildkitsandbox\\)_\\#1_SMP_Fri_Dec_2_10\\:35\\:42_UTC_2022:*:*:*:*:*:*:*"
    },
    {
     "referenceCategory": "PACKAGE-MANAGER",
     "referenceType": "purl",
     "referenceLocator": "pkg:linux-kernel/[email protected]"
    }
   ]
  }
 ],
 "relationships": [
  {
   "spdxElementId": "SPDXRef-DOCUMENT",
   "relatedSpdxElement": "SPDXRef-DOCUMENT",
   "relationshipType": "DESCRIBES"
  }
 ]
}

@wagoodman
Copy link
Contributor

wagoodman commented Mar 24, 2023

Nice!

One comment on the version value: 5.10.121-linuxkit (root@buildkitsandbox) #1 SMP Fri Dec 2 10:35:42 UTC 2022 seems surprising. From a glance it looks like the version is 5.10.121-linuxkit and the remaining fields would be aux fields on the metadata. That would also fix a lot of what's going on with the CPEs being generated (and I think eliminate 2 of them being generated).

Way too many CPEs, not sure how to fix those, as I do not know where they are generated

These are generated downstream of package creation today (this may change in the future, but for the meantime it's here https://github.com/anchore/syft/blob/v0.75.0/syft/pkg/cataloger/catalog.go#L79 ... running after the catalog creates the package).

Also regarding the pURL: as far as I can tell there isn't a "kernel" (or "kernel"-like) pURL to use. This feels like it should be a generic pURL: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic ... so pkg:generic/[email protected]. I'm not certain if there are any restrictions for the get params for the generic pURL, you might be able to add on a few more useful bits of identifying information there.

@wagoodman
Copy link
Contributor

handling duplicates, e.g. we find the kernel, but the kernel also is part of an install package

as long as the pkg.Location is correct on the package you're generating and the package manager claims owning that path, then there will be a ownership-by-file-overlap relationship created between the two packages automatically (this one

OwnershipByFileOverlapRelationship RelationshipType = "ownership-by-file-overlap"
). We don't deduplicate these packages by convention since they inherently have unique properties... one would be an RPM/DPKG/etc that describes basic packaging metadata and the other would be a package that has detailed metadata about the kernel itself from your analysis.

@wagoodman
Copy link
Contributor

listing files as part of the package vs just having the package vs just having files

I'm not certain I 100% follow yet, but if you are looking to have "owned files" as part of the concept of the kernel package then all you would need to do is to have the package metadata implement the FileOwner interface (https://github.com/anchore/syft/blob/main/syft/pkg/file_owner.go) and this should be automagically handled downstream in syft (example: https://github.com/anchore/syft/blob/main/syft/pkg/rpm_metadata.go#L50-L60).

@deitch
Copy link
Contributor Author

deitch commented Mar 27, 2023

One comment on the version value: 5.10.121-linuxkit (root@buildkitsandbox) #1 SMP Fri Dec 2 10:35:42 UTC 2022 seems surprising. From a glance it looks like the version is 5.10.121-linuxkit and the remaining fields would be aux fields on the metadata. That would also fix a lot of what's going on with the CPEs being generated (and I think eliminate 2 of them being generated).

I changed it to just use the "base" version.

as far as I can tell there isn't a "kernel" (or "kernel"-like) pURL to use. This feels like it should be a generic pURL: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic ... so pkg:generic/[email protected]. I'm not certain if there are any restrictions for the get params for the generic pURL, you might be able to add on a few more useful bits of identifying information there.

I changed it to generic, but even the language there says, "please use something else":

When possible another or a new purl type should be used instead of using the generic type and eventually
contributed back to this specification.

It has nothing to capture the additional meta information other than the usual known qualifiers, listed here, which basically are repository_url, download_url, vcs_url, filename and checksum.

So where else should we capture it?

In the kernel itself, the whole thing is a single string. To use my example above, 5.10.121-linuxkit (root@buildkitsandbox) #1 SMP Fri Dec 2 10:35:42 UTC 2022 is a single string at a known location in the kernel file.

When you run uname, it treats the first part as "kernel release" (uname -r) and the rest as "kernel version" (uname -v). This is based on utsname.h.

@deitch deitch force-pushed the kernel-and-modules-cataloger branch 5 times, most recently from 5200c78 to 1e96889 Compare March 29, 2023 09:45
@deitch
Copy link
Contributor Author

deitch commented Mar 29, 2023

I updated this to be able to scan kernel modules as well. It is successfully finding them, parsing them, getting module information out of them.

But then it needs to add them to a package, including per-module metadata. I assumed that would be the same as the kernel package, but if I do anything different (e.g. Locations, which is different per kernel file or module), I end up with hundreds packages, one per module and kernel file, which wasn't what we wanted.

So how do I:

  1. Add each of these files (kernel module) to the same pre-existing kernel package?
  2. Include the file information for each such kernel and module?
  3. Include the actual Location without accidentally creating a new package?
  4. Include the per-file (i.e. per kernel module) metadata?

@deitch deitch force-pushed the kernel-and-modules-cataloger branch from 1e96889 to bcfff4e Compare March 29, 2023 10:06
@deitch
Copy link
Contributor Author

deitch commented Mar 29, 2023

Also, in doing the purl for the kernel as generic/linux-kernel, it messed up some of the tests. I had to do some small but interesting reconstruction of TypeFromPURL() to get it to work. Whether or not we want this is up for question.

@deitch deitch force-pushed the kernel-and-modules-cataloger branch 2 times, most recently from eaae5c2 to bb6f69c Compare March 29, 2023 11:01
Signed-off-by: Avi Deitcher <[email protected]>
@deitch deitch force-pushed the kernel-and-modules-cataloger branch from bb6f69c to c9637b8 Compare March 29, 2023 11:47
@wagoodman
Copy link
Contributor

The short answer to some of your questions is that the cataloger your writing needs to use the FileResolver directly to look for and catalog supporting files (such as the kernel modules). That is, the vmlinuz file is the primary evidence location you're cataloging (the kernel itself) and the kernel modules are more supporting evidence. I think this structure meets what you're looking for (as an example):

...
  {
   "id": "ffe5a39e57372695",
   "name": "linux-kernel",
   "version": "6.2.8-060208-generic",
   "type": "generic/linux-kernel",
   "foundBy": "linux-kernel-cataloger",
   "locations": [
    {
     "path": "vmlinuz-6.2.8-060208-generic"
    }
    {
     "path": "mods/crc32c-intel.ko"
    },
   ],
   "licenses": [],
   "language": "",
   "cpes": [
    "cpe:2.3:a:linux-kernel:linux-kernel:6.2.8-060208-generic:*:*:*:*:*:*:*",
    "cpe:2.3:a:linux-kernel:linux_kernel:6.2.8-060208-generic:*:*:*:*:*:*:*",
    "cpe:2.3:a:linux_kernel:linux-kernel:6.2.8-060208-generic:*:*:*:*:*:*:*",
    "cpe:2.3:a:linux_kernel:linux_kernel:6.2.8-060208-generic:*:*:*:*:*:*:*",
    "cpe:2.3:a:linux:linux-kernel:6.2.8-060208-generic:*:*:*:*:*:*:*",
    "cpe:2.3:a:linux:linux_kernel:6.2.8-060208-generic:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:generic/[email protected]",
   "metadataType": "KernelPackageMetadata",
   "metadata": {
    "architecture": "x86",
    "version": "6.2.8-060208-generic",
    "extendedVersion": "6.2.8-060208-generic (kernel@sita) #202303220943 SMP PREEMPT_DYNAMIC Wed Mar 22 13:50:04 UTC 2023",
    "format": "bzImage",
    "videoMode": "Video mode 65535",
    "modules": [
     {
      "kernelVersion": "4.18.0-448.el8.x86_64",
      "versionMagic": "4.18.0-448.el8.x86_64 SMP mod_unload modversions ",
      "sourceVersion": "1E7D107C937AAE2A22F9942",
      "author": "Austin Zhang <[email protected]>, Kent Liu <[email protected]>",
      "license": "GPL",
      "name": "crc32c_intel",
      "description": "CRC32c (Castagnoli) optimization using Intel Hardware.",
      "path": "mods/crc32c-intel.ko"
     }
    ]
   }
  }
...

I generated something like this that uses your kernel cataloger but instead of adding another cataloger for the modules, have the kernel Cataloger continue to search for modules once a kernel is found. This gives you the most flexibility for the kind of capabilities you're looking for I think.

I just pushed some code, but consider it to be more of a "draft", as there are still some problems to solve. The biggest one is to make certain that the modules found really do correspond to the kernel that it's being paired with. (also just to be clear, feel free to throw away any code I'm pushing to your)

@wagoodman
Copy link
Contributor

To answer more explicitly:

  1. Add each of these files (kernel module) to the same pre-existing kernel package?

By having one cataloger parser function deal with looking up more supporting files to catalog (instead of adding another cataloger and depending on package merging, which will be more difficult to deal with in the code base as time moves forward).

  1. Include the file information for each such kernel and module?

By adding the module information as a child of the kernel metadata.

  1. Include the actual Location without accidentally creating a new package?

This has to be done explicitly within the cataloger as new module locations are found.

  1. Include the per-file (i.e. per kernel module) metadata?

Just as mentioned in answering 2, this is much easier if the module metadatas are children of the parent kernel metadata.

Side note: most catalogers aren't like this, but some are. The DPKG cataloger comes to mind... where the package DB file is cataloged first and additional file paths are cataloged as supporting evidence of the discovered package on the fly.

@deitch
Copy link
Contributor Author

deitch commented Mar 30, 2023

Thanks, I will look at your push and see if I can update it.

@deitch
Copy link
Contributor Author

deitch commented Mar 30, 2023

I see what you did, that makes sense. A new cataloger is only for a new package entry. Once you hit a cataloger, it should find all of its children.

I don't like having all of those hundreds of files in sourceInfo; kernel modules usually are at standard locations, so I am going to rationalize that a bit.

The part I don't get is about I get additional files. You wrote:

By adding the module information as a child of the kernel metadata.

I don't see a standard for it. The definition of Metadata is interface{}, so it could be anything. How do I create information about files such that syft will know to use those as child files?

@deitch deitch force-pushed the kernel-and-modules-cataloger branch from 30183dd to 7f942f1 Compare March 30, 2023 07:54
@deitch
Copy link
Contributor Author

deitch commented Mar 30, 2023

Cleaned up the sourceInfo sanely. It should handle the trees correctly.

How I actually add the files, though, still is beyond me.

EDIT: OK, now I see it, the FileOwner interface. As long as KernelPackageMetadata implements it, it will find them and create the relationships. It works rather nicely.

The only thing I am missing, then, is the metadata on those files. Is there any way to report on them? We extract them and make them part of the []Modules that is part pf the KernelMetadata, but not sure how to get that reported it. It wants to report checksums on everything. Maybe comment field?

Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Avi Deitcher <[email protected]>
@deitch deitch force-pushed the kernel-and-modules-cataloger branch from 7f942f1 to c33e142 Compare March 30, 2023 08:01
@wagoodman
Copy link
Contributor

The only thing I am missing, then, is the metadata on those files. Is there any way to report on them? We extract them and make them part of the []Modules that is part pf the KernelMetadata, but not sure how to get that reported it.

Currently syft-json and CycloneDX formats support showing these attributes. For syft-json there is nothing left to do, it will be shown automatically. For CyloneDX you will need to map the specific parameters that are worth capturing with the cyclonedx struct tag (example https://github.com/anchore/syft/blob/main/syft/pkg/golang_metadata.go#L5).

It wants to report checksums on everything. Maybe comment field?

I don't think I entirely understand your comment about checksums. The file digest cataloger can capture checksums for files and can be enabled with the SYFT_FILE_METADATA_CATALOGER_ENABLED=true env var.

Capturing structured data in the comments field is something that we try to avoid when possible. We have very few exceptions to this. Capturing arbitrary metadata into SPDX is a current limitation of the format.

@wagoodman
Copy link
Contributor

Also, regarding the license check failure:

Unallowable license () from "github.com/deitch/magic/pkg/magic"
Unallowable license () from "github.com/deitch/magic/pkg/magic/internal"
Unallowable license () from "github.com/deitch/magic/pkg/magic/parser"
failed validation

Mind adding a license in your repo?

@deitch
Copy link
Contributor Author

deitch commented Apr 4, 2023

Capturing structured data in the comments field is something that we try to avoid when possible. We have very few exceptions to this. Capturing arbitrary metadata into SPDX is a current limitation of the format.

Definitely agreed. I am happy to steer clear of there.

The file digest cataloger can capture checksums for files and can be enabled with the SYFT_FILE_METADATA_CATALOGER_ENABLED=true env var

Oh is that it? It just reports these empty checksums for the files. I don't care, as long as that is correct.

Currently syft-json and CycloneDX formats support showing these attributes. For syft-json there is nothing left to do, it will be shown automatically

Last I ran, it did not. I will run it again.

For CyloneDX you will need to map the specific parameters that are worth capturing with the cyclonedx struct tag

Will do.

Mind adding a license in your repo?

Sure. I thought I did, but I guess I just rushed it through.

@wagoodman
Copy link
Contributor

Json schema diff for reviewers

# ❯ diff schema/json/schema-7.1.2.json schema/json/schema-7.1.3.json
749a750,843
>     "LinuxKernelMetadata": {
>       "properties": {
>         "name": {
>           "type": "string"
>         },
>         "architecture": {
>           "type": "string"
>         },
>         "version": {
>           "type": "string"
>         },
>         "extendedVersion": {
>           "type": "string"
>         },
>         "buildTime": {
>           "type": "string"
>         },
>         "author": {
>           "type": "string"
>         },
>         "format": {
>           "type": "string"
>         },
>         "rwRootFS": {
>           "type": "boolean"
>         },
>         "swapDevice": {
>           "type": "integer"
>         },
>         "rootDevice": {
>           "type": "integer"
>         },
>         "videoMode": {
>           "type": "string"
>         }
>       },
>       "type": "object",
>       "required": [
>         "name",
>         "architecture",
>         "version"
>       ]
>     },
>     "LinuxKernelModuleMetadata": {
>       "properties": {
>         "name": {
>           "type": "string"
>         },
>         "version": {
>           "type": "string"
>         },
>         "sourceVersion": {
>           "type": "string"
>         },
>         "path": {
>           "type": "string"
>         },
>         "description": {
>           "type": "string"
>         },
>         "author": {
>           "type": "string"
>         },
>         "license": {
>           "type": "string"
>         },
>         "kernelVersion": {
>           "type": "string"
>         },
>         "versionMagic": {
>           "type": "string"
>         },
>         "parameters": {
>           "patternProperties": {
>             ".*": {
>               "$ref": "#/$defs/LinuxKernelModuleParameter"
>             }
>           },
>           "type": "object"
>         }
>       },
>       "type": "object"
>     },
>     "LinuxKernelModuleParameter": {
>       "properties": {
>         "type": {
>           "type": "string"
>         },
>         "description": {
>           "type": "string"
>         }
>       },
>       "type": "object"
>     },
1027a1122,1127
>             },
>             {
>               "$ref": "#/$defs/LinuxKernelMetadata"
>             },
>             {
>               "$ref": "#/$defs/LinuxKernelModuleMetadata"

@wagoodman wagoodman merged commit cc731c7 into anchore:main Apr 14, 2023
@deitch deitch deleted the kernel-and-modules-cataloger branch April 15, 2023 18:17
@deitch
Copy link
Contributor Author

deitch commented Apr 15, 2023

🥳

@deitch
Copy link
Contributor Author

deitch commented Apr 15, 2023

Thanks for helping walk me through this @wagoodman @kzantow !

spiffcs added a commit that referenced this pull request Apr 17, 2023
* main: (35 commits)
  Fix kernel cataloger test fixtures (#1742)
  feat: Support scanning license files in golang packages over the network (#1630)
  Add package-to-file location evidence relationships (#1698)
  Add Linux Kernel cataloger (#1694)
  Add annotations for evidence on package locations (#1723)
  add format make target (#1733)
  Update tests to not fail on Mac M1's. (#1730)
  chore(deps): update bootstrap tools to latest versions (#1728)
  Add support for nar files. (#1727)
  add highlevel details about catalogers (#1726)
  chore(deps): bump golang.org/x/net from 0.8.0 to 0.9.0 (#1722)
  chore(deps): update stereoscope to e95d60a265e384df29b7a139f5c5402d6ad72e06 (#1721)
  feat: gradle lockfile support (#1719)
  chore(deps): bump github.com/docker/docker (#1715)
  chore(deps): bump golang.org/x/mod from 0.9.0 to 0.10.0 (#1713)
  chore(deps): bump golang.org/x/term from 0.6.0 to 0.7.0 (#1714)
  chore(deps): bump github.com/spf13/cobra from 1.6.1 to 1.7.0 (#1716)
  chore(deps): bump peter-evans/create-pull-request from 4 to 5 (#1712)
  chore: update tools-golang to v0.5.0 (#1717)
  Add Nix cataloger (#1696)
  ...

Signed-off-by: Christopher Phillips <[email protected]>
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* add kernel handler

Signed-off-by: Avi Deitcher <[email protected]>

* [wip] combine kernel and kernel module cataloging

Signed-off-by: Alex Goodman <[email protected]>

* [wip] combine kernel and kernel module cataloging

Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Avi Deitcher <[email protected]>

* rename Kernel package to LinuxKernel package

Signed-off-by: Alex Goodman <[email protected]>

* split kernel and module packages within cataloger

Signed-off-by: Alex Goodman <[email protected]>

* wire up application configuration with kernel cataloger options

Signed-off-by: Alex Goodman <[email protected]>

* dont use references for packages on relationships

Signed-off-by: Alex Goodman <[email protected]>

* fix linting and tests

Signed-off-by: Alex Goodman <[email protected]>

* kernel cataloger should be resistent to partial failure

Signed-off-by: Alex Goodman <[email protected]>

* log upon kernel module metadata missing

Signed-off-by: Alex Goodman <[email protected]>

* add tests for linux kernel cataloger

Signed-off-by: Alex Goodman <[email protected]>

* update integration tests

Signed-off-by: Alex Goodman <[email protected]>

* update cli package test counts

Signed-off-by: Alex Goodman <[email protected]>

* add evidence annotations for kernel packages

Signed-off-by: Alex Goodman <[email protected]>

* reduce noise in cli test output

Signed-off-by: Alex Goodman <[email protected]>

* missed cli test to reduce noise for

Signed-off-by: Alex Goodman <[email protected]>

* fix package counts

Signed-off-by: Alex Goodman <[email protected]>

* update docs with linux kernel cataloging refs

Signed-off-by: Alex Goodman <[email protected]>

* bump json schema with new metadata fields

Signed-off-by: Alex Goodman <[email protected]>

---------

Signed-off-by: Avi Deitcher <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: <>
Co-authored-by: Alex Goodman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

kernel scan and inclusion
2 participants