Work through data format example for GSD-2020-7471 #190
Replies: 10 comments
-
I'm of the opinion not to overload fields to accomplish this. For example I'm not a fan of the affected field in OSV, I think it's overly complex. The last time I discussed this with anyone I was leaning towards one ID per package/ecosystem, but that was before we had a concept of a namespace. I think I currently would lean in the direction of using a lot of namespaces. For example if we had a PURL based namespace we would get a lot of functionality for free. |
Beta Was this translation helpful? Give feedback.
-
thinking out loud namespaces give us the best of both worlds, e.g. you can have vendor/project/organization specific namespaces, e.g. "debian.org" with whatever their data/formats are, and standards-based namespaces, e.g. "packageurl" or "purl" or "CPE" or "OVAL" or whatever format. And now people have a hint at least on what that data is and how to go about processing it. Also ideally we have tools to parse our files. |
Beta Was this translation helpful? Give feedback.
-
Yeah, I think I've come to a similar conclusion. I was thinking something like similar to what we have with cve.org and NVD namespaces we also have a namespace per other organization with more of a raw format of what they provide (I was thinking of starting with the GitHub security advisories and GitLab community advisories). Having that data is helpful (especially when they don't agree with each other) as it allows people to assign a degree of confidence based on sources they trust. And when there are disagreements between sources we can flag that for manual review and reach out to get them corrected in the sources when possible. GitLab at least is very happy to get that feedback. GitHub can be a bit more difficult since the advisories are often at a per-project level. |
Beta Was this translation helpful? Give feedback.
-
But I also think there is value in having some sort of standard easily consumable format for most of those. One thing I was thinking of is letting the OSV project take care of some of that. I was already thinking of getting the various Linux distro alerts setup to feed into their existing pipeline. One really nice thing that OSV tries to do is have an array of explicit affected versions so that stuff consuming it doesn't have to understand the weirdness of various ecosystems version range logic. |
Beta Was this translation helpful? Give feedback.
-
Anyways, I'll try to work on some example JSON of what I was thinking, hopefully sometime today |
Beta Was this translation helpful? Give feedback.
-
Sorry, guess I didn't get to it yet. I'm thinking about too many things at once. Are there any objections to feeding in the data from GitHub Advisories into a namespace |
Beta Was this translation helpful? Give feedback.
-
And if we think that is okay, would gsd-tools be a good place to integrate those? I can create issues to track that work there if so (and work on it as well (I kinda already started anyways)) |
Beta Was this translation helpful? Give feedback.
-
I think that's a perfectly reasonable place Maybe start on a feature branch or fork that we can merge once it's all working |
Beta Was this translation helpful? Give feedback.
-
Cool, I'll start working on that then and then come back to figuring out more of the details on this one. I do like the idea of having a bunch of the affected package urls somewhere and preferably not having to parse ecosystem-specific version ranges |
Beta Was this translation helpful? Give feedback.
-
Related discussion in the OSV repo: ossf/osv-schema#123 |
Beta Was this translation helpful? Give feedback.
-
I know @kurtseifried has documented some great stuff about potential data formats here, but I think it would be quite helpful to work through an actual example record, and I suggest having a try with GSD-2020-7471.
I like this one because it presents several common challenges that I would like us as a community to work on addressing. For instance, how do we want to handle package naming and versioning differences across various package managers? Here, the vulnerability is for Django, and for PyPI specifically we have PYSEC-2020-35 in the OSV format; however, what about for the Debian package where the name is python-django and there are fixes backported to earlier versions than PyPI?
If we end up using something like the OSV format for the primary GSD namespace, is this one OSV record with multiple affected entries with various ecosystem, package name, and version entries, or is it something like an entire OSV record for each ecosystem as a separate namespace (so each OS or packaging ecosystem could potentially have its own custom description, etc)?
Or do we want a separate GSD id for each one and some parent record that unifies them?
In this case I'd expect to see something for at least the following:
I'll try to throw in some example json of a few possible approaches when I have some more time, but please start sharing any ideas you all have!
Beta Was this translation helpful? Give feedback.
All reactions