-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for package dependency relationships #572
Comments
@wagoodman I started to play with the package dependencies for cyclonedx https://github.com/hectorj2f/syft/tree/hectorj2f/add_dependencies_to_cyclonedx. I am only generating the dependencies for the components as cyclonedx format recommends. Let me know if you prefer to open a PR for that. |
Some detail here regarding which ecosystems this will be feasible for in a static-analysis sense (not reaching out to external data sources, such as maven central). SPDX 2.2 relationships are used to describe what will be added to the
Question: we might not be able to accurately determine build-vs-runtime dependency depending on the lack of context (e.g. apkSummary: direct runtime dependencies
dpkgSummary: direct runtime dependencies
Relationships:
golanggo.modSummary: flat-subset of transitive build dependencies.
Relationships:
go binary buildinfo sectionSummary: flat-transitive build dependencies.
Relationships:
javapom.xmlSummary: flat-direct build dependencies.
Relationships:
manifestDoes not contain dependency information javascriptyarn.lockSummary: flat psuedo-transitive runtime dependency pins
Relationships:
package.lockSummary: transitive dependency pins with full dependency-to-dependency graph
Relationships:
package.jsonSummary: flat-direct runtime and dev dependency version ranges.
Relationships:
phpcomposer.lockSummary: direct dependency version pins
Relationships:
installed.jsonSummary: No relationships possible
Relationships:
pythonpoetrySummary: flat-transitive dependency dev and runtime relationships
pipfileSummary: flat-transitive dependency dev and runtime relationships
egg / dist metadataDoes not describe any relationships |
Can you elaborate more on the why for this feature? From my reading of it, what you are trying to do is to determine why a package foo of version bar made it into the thing you are scanning with syft. If so, I fear dumping the dependency tree might not answer that question. Parsing the package manager operations log can approximate an answer, but the only deterministic way I’m aware of to do this is to perform actual process introspection during the image build to know exactly what ended up calling e.g. dpkg -i over a file on disk. Conversely, with a purl that is differentiated enough you can augment the syft output with dependencies and much more metadata that is publicly known and available. If syft says nano version 1.2 is in this Ubuntu container of release foo, anyone can readily obtain the dependencies of that package from public sources. Don’t get me wrong, I’m a fan of taking as much primary source data from the package manager in the scanned instance as possible. And I think flat SBOMs can be limited in many scenarios (log4j being just the latest widely covered scenario) But I think how the feature surfaces, what it tries to solve and how it changes the syft experience for people that are expecting a flat output might be worth additional consideration. Thank you for working on syft and for helping syft users and the industry realize better outcomes through a thoughtful approach to the existing package manager metadata. |
Adding one thing to my comment above. It’s possible that the “why” for this feature is not “what other binary depended on this binary/made this binary materialize” but more of a transitive “what other software was needed to make this binary that then went into my image” and that’s where Build-Depends and Built-Using (in the case of dpkg) would be more useful but the in-artifact package manager metadata might not contain that information. Ideally, that information would carry in each packages own SBOM but in practice the trend seems to be that metadata will live in publicly queriable services. Meaning that maybe this augmentation of syft output could be a post-analyze stage? |
@bureado thanks for your thoughts on this --we chatted a lot about this at a recent community meeting and internally as well... I wanted to expose some of these conversations here in the issue as well. Why do this feature? That's a fair question, and one that we've been exploring before trying to take it on. Squarely put, a list of packages without how they relate won't be able to answer questions about what could have introduced a package into the artifact. Take for example, knowing that you have log4j installed is very useful, though if your intent is to remove it you need to know how it got introduced. Is it a direct dependency of your application? Did another package bring it in? Maybe both? It happens that for java packages the syft Same can be said for vulnerability analysis. I see that I'm vulnerable to CVE-X-Y for this package, when combing this with VEX information in the future that can indicate applicability of a CVE from the publisher's perspective, knowing through which path in the dependency tree the vulnerability match is for starts to matter... this is only achievable by knowing the relationships between packages. External data has richer relationship information. This is generally (nearly universally) true. Many ecosystems don't express full connectivity information between packages, however, their public repository (e.g. PyPI, Maven central, rubygems.org, etc) have this information and with some external querying you can get a better understanding of package-to-package relationships. Sometime in the near future we want to add in features that allow syft to leverage external data in an opt-in capacity. However, we do have enough raw information from the underlying artifact to convey package-to-package connectivity in most ecosystems (and we're trying to be forward with the limitations for each ecosystem in #572 (comment)). Does the existence of better connectivity data externally indicate that we should not express package-to-package relationships? Or that we should hold off until we do have this ability to query external sources? My take is that we can introduce this feature but allow for configurability of it (be able to change behavior or the source of this connectivity information, or turn it off altogether).
I 100% agree with this. We still want to provide a flat list of packages, so no change there. This would add additional elements in the Sorry for the radio silence on this @bureado , but happy to continue chatting about this. |
from refinement:
|
We'd love for this to be supported! How far is this on the roadmap ? |
+1. would love to see this feature on Syft. Is this feature on the roadmap? |
This is something that is easily available now for public use (https://deps.dev). Are there any plans for incorporating the same ? |
Ditto - I find that this feature would be incredibly helpful, particularly when using tools like DependencyTrack to visualize the dependency graph. Trivy has support for maintaining dependency relationships |
@wagoodman so I was looking at the parsing of java archives, in the context of an effort to think about Vex document hierarchies and cycloneDX over a particular dataset of containers. As far as I can tell, currently Syft doesn't provide any "Relationship" information package-to-package with java archive parsing, currently the archive parser recursively takes a known java archive object and checks what's inside based on the manifest files -- anecdotally the archive parser seems to be what's most commonly invoked when handed a production container running java. But there certainly IS a relationship if you are only reporting on the presence of one library because it was shipped inside the archive for another. While opinions vary, generally from an SBOM perspective when we talk about a "dependency" we mean "if there's a problem with this, there may be a problem with thing depending on it", or for use cases about bringing it in, as discussed elsewhere. And in THAT sense, the hierarchical information derived from the archive parsing seems like its valid dependencies, even if you don't go into the next level of sorting out the pom files. That doesn't mean that the extra compile-scope issues in the pom couldn't be relevant. But knowing, when processing an SBOM that the issue reported in jc-core is because that's a library inside the netty-common uberjar... is actually pretty valuable. Changing syft to output the hierarchy when extracting from java archives isn't that hard. I could maybe PR it (I built a POC of it after I found issue #1972 because I needed an example of maven for my purposes). Then you get into the different TYPES of relationships, should this be dependencyOf or Contains... One thing I do think about is that from a CyloneDX perspective, I would be inclined to say that any package-to-package relationship counts as a "Dependency" for its purpose. Although anecdotally, in terms of current syft output this seems to mostly just arise in OS packages containing library package types such as python etc. Anything that makes SBOMs less flat is good for a variety of use cases. As a note, processing NPM seems a bit harder within the current code framework. Right now for NPM the standard behavior of cataloger is to parse a package json to retrieve a single package, so as I understand the code architecture, to get the list of all npm packages for relationships to correctly display one bomref to another you'd need to do it at the end of the run, and then process the dependencies? |
I want to revisit this statement for a bit:
I think there could be a compromise here to get the best of both worlds. The main problem with using all 4 relationship types is that it makes it a little harder for consumers to use (they need to know about all types and union the graph together). The problem with using only DEPENDENCY_OF is that it's lossy, which isn't ideal when you're trying to discern nuance. The compromise I propose is this: In syft JSON use DEPENDENCY_OF , but annotate the syft/syft/artifact/relationship.go Lines 37 to 42 in da31eed
Even if the struct was something simple like: type DependencyKind struct {
Runtime bool
Development bool
BuildTime bool
} would be a step forward, since it would allow for multiple options to be true without muddling the graph with more edges than necessary. I feel that this would make a good trade off in terms of making graph traversal easier to grok without loosing information. |
Linking the latest and greatest SPDX 3.0 relationship types as a dev note for those picking this up on a per ecosystem basis: |
Team consensus from our weekly gardening meeting is to not tackle #572 (comment) , meaning we will only have DEPENDENCY_OF. Note: this means that if something is a dev, build, or dependency then it will still be captured as DEPENDENCY_OF. In the future we might still try and tackle adding edge qualifications or more edges of various types... but not on the first pass. |
What would you like to be added:
Support tracking the full dependency graph for packages in the form of relationships, for the ecosystems that support extracting this information.
Why is this needed:
An SBOM is useful for at least listing what makes up a software artifact. However, it is more useful to know how a dependency is related to the artifact (is it a direct dependency? or a transitive dependency? is this dependency used by several other packages, or just one?).
Below is a list of each ecosystem that we could implement this for (really it's a list of all of the parsers for all catalogers). It doesn't mean that we should implement this entire list, there are some ecosystems that just don't raise up enough information to make adding relationships useful. This will have to be taken on a case-by-case basis.
Nixsee note belowThese are catalogers that have been deemed not possible / practical to implement raise up relationships for at this time:
Notes:
This assumes that #556 is implemented, allowing for package catalogers to return relationships as first class evidence.
The text was updated successfully, but these errors were encountered: