Store version ranges #140

pombredanne · 2020-01-14T11:36:05Z

To support vulnerabilities that impact or fix a package version range, we would need to store that data.
In addition to that, we would also store the relationship between vuln and package for every known packages at the time we create or update a vuln.
This is follow up of #119

pombredanne · 2020-09-10T08:03:31Z

From dupe 248:

#65 and #84 don't provide concrete versions hence we need to store version ranges for that we must first improve the data models

sbs2001 · 2020-10-07T06:51:07Z

Context

Security advisories as a part of convention provide patched/vulnerable packages using a version range. For eg ruby provides this https://github.com/rubysec/ruby-advisory-db/blob/6efbdb053cbe41e55f435ddee25a1562fb73f3f2/gems/actionpack/CVE-2012-1099.yml#L23

When an advisory provides such version ranges we in ideal case want to convert the version range into a discrete set of versions, ie "resolve the range".

We do this by

Obtaining all the version released till date of the given package by calling some API. In this case the endpoint will be https://rubygems.org/api/v1/versions/actionpack.json .
Now we classify the versions into two groups. a. Satisfies the range b. Does not satisfy any range. And then using some logic a group packages are either vulnerable or patched while b group packages have opposite vulnerability status than the a group ones.

The actual problem :

In some cases there is no API to carry out step 1 . ie Version ranges can't be resolved into discrete versions. We want some way to store such data in a similar manner when we have discrete versions. Eg https://security.gentoo.org/glsa/202009-18

copernico · 2020-10-07T09:31:39Z

Maybe it is not necessary to require that version ranges should be resolved a-priori (that is, resolved to generate a finite set of existing artifacts); they could be used to determine a-posteriori whether or not a given artifact is in the range or not. The advantage of the latter method is that you would not need the API to determine what exists.
In practice though, one needs to enforce some limits on how to interpret intervals that are open-ended (for example, version 2.2 and up).

One way could be to allow only the "last segment" of the version to vary; for example:

2.2 and up --> 2.2, 2.3, 2.4,....... (but not 3.0)
2.2.1 and up --> 2.2.1, 2.2.2, 2.2.3 (but not 2.3.x)
3.0 and up --> any 3.x (but not 4.x, 5.x etc)

Would this make sense? It's a compromise, this still requires enumerating a few intervals to achieve the correct semantics, but at least it is simple, hopefully easy to grasp, and quite flexible to cover the large majority of cases.

Your thoughts?

sbs2001 · 2020-10-07T10:54:33Z

@copernico

Ranges with single bound are problematic (they ignore backports). A more formal way of allow only the last segment is to use the pessimistic-operator https://thoughtbot.com/blog/rubys-pessimistic-operator . There's tooling for that alright.

Unrelated to this :
I actually proposed something similar which was to avoid resolving ranges entirely and store them directly in DB (you can find the convo with @haikoschol and @pombredanne in the chat if you scroll wayyyy up). We rejected the proposal primarily because assuming versioning format is a bad idea. My current opinion is to use ranges as a last resort/fallback.

Primary motivation to store ranges ATM is avoid losing data from places like https://security.gentoo.org/glsa and then have something to fallback on when some specific version is not present in our data.

copernico · 2020-10-07T11:32:44Z

@sbs2001 I think the pessimistic operator (which I ignored, thanks for the pointer) describes exactly the semantics I had in mind. Re: backports: I am not sure I understand what you mean; in any case I would stay away from using wallclock time (as in released-before, or released-after) when determining if a version is "after" or "before" another: I guess what matters are version identifiers: for sure 3.1 is after 3.0; but is 2.1 after 3.0? maybe, or maybe not, we need a separate interval expression for 2.x
Basically, if a fix of, say, version 3.1.1 is then backported to, say, 2.1.14, what we would have to do is to also specify 2.1.4 and subsequent as fixed, in a separate (extended-)purl.

pombredanne · 2020-10-13T10:41:24Z

@sbs2001 you wrote:

I actually proposed something similar which was to avoid resolving ranges entirely and store them directly in DB (you can find the convo with @haikoschol and @pombredanne in the chat if you scroll wayyyy up). We rejected the proposal primarily because assuming versioning format is a bad idea. My current opinion is to use ranges as a last resort/fallback.

Do you mind digging this up and pasting the chat log in a comment here?

pombredanne · 2020-10-13T10:43:46Z

For reference there is a related Package URL PR by @david-a-wheeler at package-url/purl-spec#93 that I need to reply to and tickets at package-url/purl-spec#66 and package-url/purl-spec#84
And also ossf/wg-vulnerability-disclosures#28 (comment)

pombredanne · 2020-10-13T11:32:16Z

Here are some thoughts and background rehashing for reference:

Context

There is no (mostly) universal syntax for version ranges and there is no (mostly) universal ways to compare two versions. Each package type may define their own syntax and semantics. For instance:

Rubygems https://guides.rubygems.org/patterns/#semantic-versioning
node-semver as used for npms https://github.com/npm/node-semver#ranges
Python https://www.python.org/dev/peps/pep-0440/
Debian and Ubuntu https://www.debian.org/doc/debian-policy/ch-relationships.html
RPM distros https://rpm.org/user_doc/dependencies.html#versioning and https://fedoraproject.org/wiki/Archive:Tools/RPM/VersionComparison
Perl https://perlmaven.com/how-to-compare-version-numbers-in-perl-and-for-cpan-modules
of course NVD CPEs https://nvd.nist.gov/General/News/CPE-Range-Notification
Apache maven http://maven.apache.org/enforcer/enforcer-rules/versionRanges.html
NuGet https://docs.microsoft.com/en-us/nuget/concepts/package-versioning
Apache and Nuget following more or less math intervals https://en.wikipedia.org/wiki/Interval_(mathematics)
Gentoo https://wiki.gentoo.org/wiki/Version_specifier
Alpine linux https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/master/src/version.c (which might be using Gentoo conventions)
Go https://golang.org/ref/mod#versions which uses semver with some twists

Problem

Version ranges are useful because they can help to map a future, not-yet-known package version to a known vulnerability impacting it if that package version is within that range.

Some solution elements

There are a few things to consider:

in all cases storing version ranges when available in a vulnerability is a useful data point
since there is no universal syntax and algorithm for version comparison we could either:
2.1 define a mostly universal syntax for ranges (as part of Package URLs) and package-specific way to normalize to that syntax.
2.2 use a package-specific syntax and do not normalize it
regardless of the syntax selected, the semantics of versions comparison would need to be package type-specific at best to be correct, and in some cases they may be specific to package instance e.g. there would be a collection of algorithm to choose from with type defaults and possibly an override at a package level ( and a package-level version scheme feels rather contrived)

As for the problem at hand here:

We want to store the versions range
We want to store concrete real relationships between a package version and a vulnerability. This matters because storing a version range only (e.g. potential relationship) may need to be overridden. Also it allows us to navigate the graph of relationships between packages and vulnerabilities efficiently. This means that each time we add a new vulnerability we need to resolve a possible range to a list of concrete known package versions.
Yet at the same time, we do not know yet about future versions (or may not be up to date about all the known version of a package just now) so we would also need a way to resolve any version that would not be found as stored in the DB against version ranges.

So in recap IMHO we should ideally:

store version range
store concrete relationships
resolve ranges on create or on access or on query either using a list of known versions or without that list

david-a-wheeler · 2020-10-13T14:46:55Z

@pombredanne - thanks for the excellent list of examples! That at least gives us specific examples to compare.

pombredanne · 2020-10-13T14:48:00Z

@david-a-wheeler frankly I wished everyone would use semver ;)

sbs2001 · 2020-10-13T16:01:12Z

@pombredanne

Do you mind digging this up and pasting the chat log in a comment here?

Here are some relevant messages regarding that matter. You can search gitter using this and read the whole thing there :) . This is borderline cryptic without the context.

Shivam Sandbhor @sbs2001 Jan 31 08:33

This will need us to map the vulnerabilty to packages with the 3 relationships(affected version range,unaffected version range,fixed version range) . We will just query the package, and check if the version we have matches any of the range in the relationships.

@haikoschol sorry for the vague diagram. Basically we will be having 3 tables.

1st table with 2 columns which would be 'id' and 'package name'. This table will have just the package name not the version , so for eg ffmpeg will get to occupy only 1 row in this table, even though they have like 100 different versions . Let's call the id of package as 'pid'

2nd table will have 2 columns: 1st for id and 2nd for the vulnerability name for eg 'CVE-XXXXXXXXX'. Let's call the id of vulnerability as 'vid'

This is the relations table 3rd table will have 6 columns ,explained as below:
1 This column is basically a foreign key, it has the 'pid' of the corresponding package
2.It is a Foreign key 'vid' of corresponding vulnerability id
3 Unaffected version range of the package for the package with id 'pid' and vulnerability 'vid' .
4.Affected version range for 'pid' and 'vid'
5.Patched version range for 'pid' and 'vid'
6.Id of row
Side Note: We could break this table into smaller parts

Philippe Ombredanne @pombredanne Jan 31 00:51
. We will just query the package, and check if the version we have matches any of the range in the relationships.

that could work too, but storing ranges only means that there is no concrete relationships to actual package records?

Instead I would suggest effectively to store all the versions that are known to be impacted
AND eventually fetch all the new released versions as they are released... and while doing that do the work once to determine if they are impacted or not based on the version ranges?

Haiko Schol @haikoschol Feb 01 02:48
@sbs2001 regarding your proposed data model using version ranges: why not get rid of the relations table as well and just have a foreign key on Package together with the three version ranges in Vulnerability?
one difference between the current model and yours (with or without the relations table) is that part of the data filtering work needs to be done outside of the database. the python code gets all vulnerabilities that affect any version of a given package and has to check for each one of them whether the version the user is interested in falls in the "affected" version range

Haiko Schol @haikoschol Feb 01 02:54
for one package the list of all vulnerabilities that affect any version is probably pretty short. but we want to receive a list of packages (package URLs actually) that constitutes all dependencies for a given project.
another issue is, and i think this is what Philippe was referring to, that assuming every package we deal with uses a sane versioning scheme on which the concept of "ranges" can be applied is quite optimistic
afaik some package managers put no restrictions on the format of versions. so it's possible that we get "versions" like "bob", "jane", "alice", etc.

Haiko Schol @haikoschol Feb 01 02:59
or a project changes their versioning scheme at some point. firefox is an example of that

joshbressers · 2020-10-13T16:11:26Z

My thinking on this has morphed a bit since my initial comments in package-url/purl-spec#84

My use case revolves around semver (and only semver) and it's still painful. The example: introduced in version 7.0.0 and 6.5.2 and fixed in version 7.1.1 and 6.8.12 is VERY hard to capture in a way that is easy to understand or parse.

I now plan to create a service focused specifically on PURL IDs. The PURL API will be a way to get a listing of all product names, release versions, date of release, and other metadata I find useful along the way.

The vulnerability data will list only vulnerable PURL IDs. If the version isn't listed, it's not affected. The vulnerable IDs field will be quite large in some cases, but that's OK because this data is for machines not humans.

I envision the workflow to look something like

give me a list of all products
give me a list of all versions released for product foo
Extract the PURL IDs I care about
Give me a list of all vulnerabilities affecting the following PURL IDs

Or

Give me a list of all PURL IDs affected by this vulnerability
Get a chronological list of releases
Walk the list to find the version closest to mine not affected by the vulnerability in question

As I work on this problem I have no doubt my thoughts will change again.

sbs2001 · 2020-10-13T16:52:17Z

@joshbressers

You are not really solving the problem.

Get a chronological list of releases

I was on the same page as you until I had a chat with @pombredanne regarding using "chronology" as a basis for version comparision :

Shivam Sandbhor
@sbs2001
Sep 29 11:59
@pombredanne FWIW if you need something universal to compare versions , I think we are looking at the wrong thing to compare for.

Instead of comparing the version numbers. It makes more sense to me to compare using the release date of whatever we are comparing. IMHO this should be easy to implement too since we have most of code to fetch package metadata at multiple places in *code projects


Philippe Ombredanne
@pombredanne
Sep 29 14:28
but what about non-linear histories? say foo 1.0 and foo 2.0 have both the same vulnerability. It is patched first in foo 1.1 and then later in foo 2.2 (which comes after foo 2.1) ?

Shivam Sandbhor
@sbs2001
Sep 29 14:33
I don't get the problem here. In this case 1.1 and 2.2 will have release date greater than the vulnerable packages.

Philippe Ombredanne
@pombredanne
Sep 29 14:34
yes but 1.1 is not a fix for 2.0

Shivam Sandbhor
@sbs2001
Sep 29 14:38
right. This fails for providing the closest fix.

And also

Walk the list to find the version closest to mine not affected by the vulnerability in question

would need some comparator function. This approach would work if you are working with just some sane versioning scheme, like semver.

sbs2001 · 2020-10-13T17:10:25Z

@joshbressers I am curious, how would you obtain only vulnerable PURL IDs ? AFAIK almost all security advisories compress those discrete set of packages into version ranges and let the consumer interpret/resolve the ranges.

joshbressers · 2020-10-13T17:11:50Z

Hah, I figured chronological would be a less confusing way to describe this all, I was clearly wrong :)

Let's ignore that word. What I really want is a list of releases in order. I wrongly assumed dates could do that

Here is an example

My version list in the order it was released looks like this

version = [
  '1.0.0',
  '1.0.1',
  '1.1.0',
  '2.0.0',
  '1.1.1',
  '2.1.0',
  '1.1.2',
  '2.1.1',
  '1.1.3'
]

I know there is a vulnerability in

vulnerable_versions = [
  '1.0.1',
  '1.1.0',
  '2.0.0',
  '1.1.1',
  '1.1.2'
]

So we end up with something that looks like this

version = [
  '1.0.0',
  '1.0.1', # vulnerable
  '1.1.0', # vulnerable
  '2.0.0', # vulnerable
  '1.1.1', # vulnerable
  '2.1.0',
  '1.1.2', # vulnerable
  '2.1.1',
  '1.1.3'
]

Now I can figure out the closest fix pretty easily.

joshbressers · 2020-10-13T17:20:10Z

@joshbressers I am curious, how would you obtain only vulnerable PURL IDs ? AFAIK almost all security advisories compress those discrete set of packages into version ranges and let the consumer interpret/resolve the ranges.

I am building this service for the products I work on. I need machine readable data, and I get to control everything that's happening. It's a very different problem than the general community.

pombredanne · 2020-10-13T17:23:21Z

@joshbressers

I am building this service for the products I work on.

neat! if you think there is some bits and data that could be useful feel free to reach out!

joshbressers · 2020-10-13T17:25:19Z

@joshbressers

I am building this service for the products I work on.

neat! if you think there is some bits and data that could be useful feel free to reach out!

Thanks @pombredanne!

Everything I do will end up public on github, I'll certainly be looking for honest feedback :)

pombredanne · 2020-10-13T17:40:22Z

I'll certainly be looking for honest feedback :)

same here! 👍

pombredanne · 2020-10-13T18:33:13Z

And a few extra references to versions specs in the wild:

Gentoo https://wiki.gentoo.org/wiki/Version_specifier
Alpine linux https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/master/src/version.c (which might be using Gentoo conventions)
Go https://golang.org/ref/mod#versions which uses semver with some twists

copernico · 2020-10-13T19:51:23Z

Very interesting discussion; I have not gone through all the different version specification schemes and I am somewhat familiar with only a small subset of them, but I suspect (should I say, hope) they are all variants of a general scheme x.y.z.j.k.h where versions can be arranged in a tree with a depth d that is a small integer (typically 3 or 4), as depicted in this figure (for a subtree corresponding to a given X.Y major.minor release series)

(figure from : https://link.springer.com/article/10.1007/s10664-020-09830-x)

Can you show an example of versioning scheme that does not fit this (possibly naïve) generalization?

david-a-wheeler · 2020-10-13T20:18:28Z

There are exceptions. Sentimenal versioning lists some examples.

In TeX and METAFONT (two tools widely used in mathematics), new versions add a new digit approaching an irrational number. The version numbers of TeX approach π (the current version is 3.14159265) and the version numbers of METAFONT approach e.

Perhaps more importantly, projects occasionally CHANGE their version number schemes. This is made famous by Bill Gates counts to 10. The Windows version numbers are (overly simplified) as 1, 2, 3, 3.1, 3.11, 95, 98, NT, 2000, XP, Vista, 7, 8, 10.

The solution used by the packaging formats rpm (for Red Hat, Fedora, CentOS, etc.) and deb (Debian, Ubuntu, etc.) is to add "epoch numbers", integers that notionally precede the "normal" version number. See the Fedora docs on this and t the Debian docs on this. Typically an epoch, if included is written as the epoch number, colon, then the "normal" version number. One quirk: in rpm, an epoch epoch number is lower than anything with a given epoch number, while in Debian an "empty" epoch is considered 0.

I think we need to at least support epoch numbers, because otherwise there's no way to handle people who change version number schemes, and that is the standard way to do it.

pombredanne · 2020-10-14T13:54:18Z

@copernico you wrote:

Can you show an example of versioning scheme that does not fit this (possibly naïve) generalization?

I think that your generalization works as it stands. Even if Debian and RPM packages use of epochs as pointed by @david-a-wheeler the epoch would still fit in a tree view of the versions world as the first optional segment.

IMHO the variations are on how you would create that tree that would require to compare version and things that do change are whether:

each version segment allows strings vs. numbers
each version segment is treated as a number or as a string
if leading zeroes in a string or numeric segment are significant or not
and then there suffixes (rc1, alphe, pre, SNAPSHOT) that a certain package type may treat differently.

I cannot fathom of a (mostly) universal way to organize the tree by comparing the versions reliably (reliably being the difference between stating that a version is not vulnerable vs. vulnerable for instance e.g. rising a false negative) with a single algorithm that is not package-type specific.

The closest that would come to mind would be @AMDmi3 's awesome https://github.com/repology/libversion which has a great doc highlighting the complexity of trying to get things right at scale https://github.com/repology/libversion/blob/master/doc/ALGORITHM.md and that support most everything including distros versions.

And also @orsinium https://github.com/dephell/dephell_specifier with support for Python PEP-440, Semver, Ruby, npm and Maven

sbs2001 · 2020-10-19T12:14:19Z

I and @pombredanne recently had a discussion regarding how to handle version ranges of packages in the context of vulnerablecode. And we decided to run a little experiment.

We would store concrete relationships between packages(these include version) and vulnerabilities the same way we are already doing.

Now coming to new things :

We would have another table like :

class VulnerablePackageRanges :
   vulnerability : Foreign key to vulnerability
   package : A string of package url  without the version. 
   version_range:  A string containing version ranges for which the given package is vulnerable to the vulnerability

Eg value of package could be pkg:npm/foo .

Now if a user asks for vulnerability status of some version of npm package foo , there would be 2 cases :
1. We already have data about the package and it's specific version. In this case we return what we know .
2. We don't have any data about the asked package in the concrete packages. In this case we would look whether there
exists a range expression for the said package(with same name, type omit version). If yes we resolve the range and determine the vulnerability status of the
package. Else we return empty handed .

For resolving ranges we would be using #140 (comment) 's 2.1 point . The universal syntax would be more or less a stripped down version of PEP 440 (this is just an experiment).

Periodically we would also fetch all versions of packages contained in VulnerablePackageRanges and resolve them using the already present ranges.

@pombredanne correct me if I misunderstood you anywhere :)

pombredanne · 2020-10-19T13:50:28Z

@sbs2001 this makes ++ sense. To recap and reformulate my understanding this would mean:

we experiment with using a mostly universal syntax for version ranges based on the well specified https://www.python.org/dev/peps/pep-0440/#version-specifiers . This is used to store a version range as a single string
the actual comparison procedure of two versions (and the check if a version falls within a range) would be:
2.1 specific to a package type (e.g. npm, pypi, etc) ...
2.2 ... with a default if a package does not have it (likely based on dephell or repology ways)
2.3 ... and the ability for a single package type/ns/name to override this
... though these would be refinements post experiments

And your approach boils down to going from the most to the least specific:

first search for a concrete and explicit relationship between a package version and a vulnerability
else, do a version range check between the package version and package/vulnerability/version range if any
else ... we later could also navigate the package graph for inferences, say we know that pkg:deb/[email protected] and pkg:rpm/[email protected] have the same source code and extend the search to other related package type/names

pombredanne · 2021-05-24T14:40:46Z

Repasting here the design for version ranges from #119 (comment) and updating it at the same time:

Version ranges specifier

A version ranges specifier is a string with this syntax:
<scheme>:<range>,<range>

For example:
semver:1.2.3,>=2.0.0
The <scheme> (such as semver, debian, etc.) determines how to interpret a version range and in how two versions compare as lesser or greater and if a version is within a range.
The <scheme> is followed by one or more <range> separated by a comma.
Each <range> is declared this way:
- "=": Version equality operator. Implied if not present and means that a version must be equal to this value as in "=1.2.3"
- "!=": Version exclusion operator. Means version should be excluded "!=1.2.3"
- "<=", ">=": Inclusive range operator such as "<=1.2.3" which means all versions less than or equal to "1.2.3"
- "<", ">": Exclusive range operator such as "<1.2.3" which means all versions less than "1.2.3"

For example >=1.2.3,<2.0.0means all versions greater than or equal to 1.2.3 but less than 2.0.0

Within a range the syntax of a version such as 1.2.3 is defined by the scheme
Spaces are not significant and are removed in the canonical form: "!=1.2.3" and "! = 1.2.3" are equivalent.
Version ranges specifiers are case-insensitive and lowercased in their canonical form.
The ordering of multiple <range>s in a specifier is not significant. The canonical ordering is TBD.
A range cannot contains operator characters (><=!,*). If required (which should be rare in practice ) they need to be quoted using the URL quoting rules.
Equality = and exclusion != is based on the exact test of two lower-cased version strings and is not scheme-specific.
The <scheme> determines:
- how two versions are compared as greater than or lesser.
- how its version range specifiers syntax can be reduced to the simplified range specifiers syntax defined here.
The special "star range" of <scheme>:* means that any version would match this range. A star range can only be used alone and no other range can be added. It should be used sparingly as unbounded ranges are rare and typically problematic.

Notes and caveats:

Comparing versions from two different schemes is unspecified (and typically does not make sense even though there may be some obvious similarities between the semver version of an npm and the debian version of its Debian packaging.
Schemes are related to Package URL types in the sense that each Package URL type is related to one version scheme, but multiple types can reuse the same scheme (such as semver).

Some of the known schemes and their codes are:

generic: a generic version comparison algorithm (which is TBD, likely a split on punctuation and dealing with digit vs. strings comparisons, like in libversion)
debian: Debian and Ubuntu https://www.debian.org/doc/debian-policy/ch-relationships.html
rpm: RPM distros https://rpm.org/user_doc/dependencies.html#versioning and https://fedoraproject.org/wiki/Archive:Tools/RPM/VersionComparison
ruby: Rubygems https://guides.rubygems.org/patterns/#semantic-versioning
semver: node-semver as used for npms https://github.com/npm/node-semver#ranges
is also used by Rust: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html and composer and several other
python: Python https://www.python.org/dev/peps/pep-0440/
perl: Perl https://perlmaven.com/how-to-compare-version-numbers-in-perl-and-for-cpan-modules
go: Go modules https://golang.org/ref/mod#versions which uses semver with a twist
cpe: NVD CPE Ranges https://nvd.nist.gov/General/News/CPE-Range-Notification
maven: Apache Maven http://maven.apache.org/enforcer/enforcer-rules/versionRanges.html
nuget: NuGet https://docs.microsoft.com/en-us/nuget/concepts/package-versioning#version-ranges
Note that Apache Maven and NuGet are following more or less a math intervals syntax as in https://en.wikipedia.org/wiki/Interval_(mathematics)
gentoo: Gentoo https://wiki.gentoo.org/wiki/Version_specifier
alpine: Alpine linux https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/master/src/version.c (which might be using Gentoo conventions)

Implementation

https://github.com/nexB/univers by @sbs2001 implements this spec

https://github.com/nexB/univers

Usage in VulnerableCode

Here is the design we discussed to put version ranges to use here.

One problem is that the package version ranges a vulnerability applies to may be misleading after they have been published unless they are updated. For openstack/ossa@777e7b7#r51222097 was last updated in 2014 and does not apply to "All versions", but really only to package versions known at the times this advisory was published.

A related problem is unbounded version ranges, or the lack of version ranges altogether, where an advisory tells when a vulnerability is fixed but not when it appeared, such as https://github.com/mozilla/foundation-security-advisories/blob/master/announce/2016/mfsa2016-14.md

The difficulty is that we do not want to miss reporting any version that is vulnerable (a dangerous false negative) yet we do not want to pollute the reporting with package versions that are not certain to be vulnerable (false positive).

As a solution, the proposed design tries to handle these two cases:

by storing concrete vulnerability-package relationships when we are confident this relationship exists
by storing a version ranges specifier in a vulnerability-package relationship and being able to query if a package version satisfies it and compute a confidence value in these cases (e.g. signaling a possible false positive and avoiding false negative).

For instance with openstack/ossa@777e7b7#r51222097 that last updated ~ 7 years ago, the confidence that it applies to a package version released in 2021 should be fairly low.

Also since confidence and version ranges specifier are stored they can also be refined and curated by hand in the future.

Therefore, in addition to concrete relationships between package versions and a vulnerability we want to store also a version range with these specifics:

A version ranges specifier is a string as defined above. Stored in PackageRelatedVulnerability
If all the versions in the range are exactly pinned/concrete version, then we would not store a range. Instead we store only the concrete relationships.
When a vulnerability is created or updated, we consider its date of creation or last update and we:

update its stored version range (string) as needed (TBD deal with overrides n the future)
update the concrete relationships with package versions (including possibly updating, creating and deleting relationships.)
This should be based on a best effort of the set of known package versions that existed as released only up to the date of creation or last update of the vulnerability.
For some corner cases, we need a special version range with the value * which means that all versions of a package are impacted. This should be used rarely as in most cases this can be instead an open range with no upper or lower bound. When have such range or unbounded ranges, we should limit the creation of a concrete vulnerability-package relationship to some fixed number of versions (TBD, possibly one year back and up to 5 versions back) to avoid creating unverified relationships

When a new package version becomes known independently of a vulnerability update or creation, we do not update or create new relationships
There is a new notion of "confidence" that we should store at the vulnerability-package relationship level. This should be maximal by default and could be overridden manually.
When querying for the vulnerabilities of a package version, we return two sets of relationships:

the relationships stored in the DB with the stored confidence, typically high confidence.
a query of potential relationships based on checking if the requested package version is within a vulnerabilty-package (PackageRelatedVulnerability)-stored version range.

The confidence values that will be returned with this query should be based on a few factors such as:

"decay"/discount based on how old the vulnerability range was last updated
and/or the time passed between the vulnerability disclosure/update and the date of the package release
and/or whether the version range is "closed" e.g. has a lower bound, and upper bound or no bound.

When storing ranges the unbound ranges are a possible source of problems as they may resolve incorrectly to version that are NOT affected by a vulnerability. To cope with this we should be able to query and find all PackageRelatedVulnerability and Vulnerability that an open e.g. that are missing a lower bound, and upper bound or have no bound to use as an input for reaching out to upstream data sources or package projects, to create a wall of shame or as an input to curation and review.

We also need to revert the changes in #436 and ensure that we effectively store all the concrete relationships as defined here.

pombredanne · 2021-05-24T14:41:10Z

@sbs2001 @Hritik14 I hope I captured today's chat correctly ^

pombredanne · 2021-05-24T14:56:02Z

Here is an example with real data:

Today, CVE-2021-foo is published and it affects the django package and these version ranges:

django <1
django 1.2> to <2
django 2.3> to <3
django 3.1> to <4

Based on this:

I can conclude that 1.3, 2.4, and 3.2 versions that exist today are vulnerable, and I would create a hard relationship they are the versions that exist at the time of the advisory publication.
I also store a version ranges spec for this vulnerability/package
other versions 2.2, 1.1, 3.0 are not marked as anything (and not even stored) since they are not impacted at all and outside of the range

Tomorrow:

There are new package releases of 1.4, 2.3, 3.3 and 3.4: they are potentially vulnerable as they are within the stored ranges spec. Yet I will NOT store a new concrete relationship (yet). Instead a query will catch them because they are part of the range, but this is a potential issue, not a verified one.
Based on the day since the vulnerability was last updated we can apply some confidence "decay" based on time passed. Say for instance, I define that 5 years is the time for a vulnerability to decay entirely, then after a year, the confidence that this vulnerability applies to a version that matches its version ranges spec but that was not yet released when the vulnerability was last updated would be 4/5th, e.g. 80/100 as opposed to 100%. The specifics of this are secondary and can be designed later.

The day after tomorrow:

there is a an update on the advisory: there are now fixes available in 1.5, 2.5 and 3.5. When we get the data we are:
- updating both the store version ranges spec AND t
- we can update these versions 1.4, 3.2, 3.3, 3.4 as vulnerable with a concrete relationship, e.g. reporting this now as as a verified issue.

So:

package releases done after a vulnerability publication/update and that satisfy the original vulnerable ranges are NOT triggering a relationship update. They are though queried and reported.
Fix published later that updates the vulnerable ranges trigger a concrete relationship update

Add release information to review tags.

pombredanne · 2021-08-23T09:33:43Z

FYI, this is an interesting related ticket: CVEProject/cve-schema#87

pombredanne · 2021-08-24T07:56:32Z

In particular this comment I posted is of relevance here:

computable open-source version information CVEProject/cve-schema#87 (comment)

TG1999 · 2023-01-17T16:18:25Z

Thanks for raising this, I am closing this now and I will let Philippe merge the purl vers PR package-url/purl-spec#139 now we have something that mostly works for the version range.

pombredanne mentioned this issue Jan 14, 2020

Update for package version ranges #141

Closed

pombredanne added enhancement Priority: high labels Jan 14, 2020

pombredanne mentioned this issue Jan 14, 2020

Collect all known package versions (Package URLs) #142

Open

sbs2001 mentioned this issue Jun 20, 2020

Change data models, to fix existing issues #206

Closed

pombredanne mentioned this issue Sep 10, 2020

Store version ranges #248

Closed

pombredanne mentioned this issue Oct 13, 2020

Wildcards in purl? package-url/purl-spec#84

Open

pombredanne mentioned this issue Oct 19, 2020

Version range package-url/purl-spec#66

Closed

pombredanne mentioned this issue May 24, 2021

Importers without a vulnerable package list #449

Closed

pombredanne mentioned this issue May 24, 2021

"Wall of shame" queries for problematic vulnerability advisories data #463

Open

pombredanne referenced this issue in openstack/ossa May 25, 2021

Update verbatim version info for OSSA-2013*

777e7b7

Add release information to review tags.

sbs2001 mentioned this issue Jun 8, 2021

Time travel to the date of advisory publish time when importing #467

Merged

Hritik14 mentioned this issue Jun 25, 2021

Question: problem with npm importer or something else? #488

Open

pombredanne mentioned this issue Aug 3, 2021

CycloneDX Ruby Support coinbase/salus#410

Merged

pombredanne mentioned this issue Nov 1, 2021

Add initial draft spec for version ranges aboutcode-org/univers#11

Merged

pombredanne mentioned this issue Nov 30, 2021

Add mostly universal version range spec draft package-url/purl-spec#139

Merged

pombredanne added the Core models label Jan 24, 2022

TG1999 added this to the v32.0.0 milestone Jan 13, 2023

TG1999 closed this as completed Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store version ranges #140

Store version ranges #140

pombredanne commented Jan 14, 2020 •

edited

Loading

pombredanne commented Sep 10, 2020

sbs2001 commented Oct 7, 2020

copernico commented Oct 7, 2020 •

edited

Loading

sbs2001 commented Oct 7, 2020

copernico commented Oct 7, 2020

pombredanne commented Oct 13, 2020

pombredanne commented Oct 13, 2020 •

edited

Loading

pombredanne commented Oct 13, 2020 •

edited

Loading

david-a-wheeler commented Oct 13, 2020

pombredanne commented Oct 13, 2020

sbs2001 commented Oct 13, 2020 •

edited

Loading

joshbressers commented Oct 13, 2020

sbs2001 commented Oct 13, 2020

sbs2001 commented Oct 13, 2020

joshbressers commented Oct 13, 2020

joshbressers commented Oct 13, 2020

pombredanne commented Oct 13, 2020

joshbressers commented Oct 13, 2020

pombredanne commented Oct 13, 2020

pombredanne commented Oct 13, 2020 •

edited

Loading

copernico commented Oct 13, 2020 •

edited

Loading

david-a-wheeler commented Oct 13, 2020

pombredanne commented Oct 14, 2020

sbs2001 commented Oct 19, 2020

pombredanne commented Oct 19, 2020

pombredanne commented May 24, 2021

pombredanne commented May 24, 2021

pombredanne commented May 24, 2021

pombredanne commented Aug 23, 2021

pombredanne commented Aug 24, 2021 •

edited

Loading

TG1999 commented Jan 17, 2023

Store version ranges #140

Store version ranges #140

Comments

pombredanne commented Jan 14, 2020 • edited Loading

pombredanne commented Sep 10, 2020

sbs2001 commented Oct 7, 2020

copernico commented Oct 7, 2020 • edited Loading

sbs2001 commented Oct 7, 2020

copernico commented Oct 7, 2020

pombredanne commented Oct 13, 2020

pombredanne commented Oct 13, 2020 • edited Loading

pombredanne commented Oct 13, 2020 • edited Loading

Context

Problem

Some solution elements

david-a-wheeler commented Oct 13, 2020

pombredanne commented Oct 13, 2020

sbs2001 commented Oct 13, 2020 • edited Loading

joshbressers commented Oct 13, 2020

sbs2001 commented Oct 13, 2020

sbs2001 commented Oct 13, 2020

joshbressers commented Oct 13, 2020

joshbressers commented Oct 13, 2020

pombredanne commented Oct 13, 2020

joshbressers commented Oct 13, 2020

pombredanne commented Oct 13, 2020

pombredanne commented Oct 13, 2020 • edited Loading

copernico commented Oct 13, 2020 • edited Loading

david-a-wheeler commented Oct 13, 2020

pombredanne commented Oct 14, 2020

sbs2001 commented Oct 19, 2020

pombredanne commented Oct 19, 2020

pombredanne commented May 24, 2021

Version ranges specifier

Notes and caveats:

Some of the known schemes and their codes are:

Implementation

Usage in VulnerableCode

pombredanne commented May 24, 2021

pombredanne commented May 24, 2021

pombredanne commented Aug 23, 2021

pombredanne commented Aug 24, 2021 • edited Loading

TG1999 commented Jan 17, 2023

pombredanne commented Jan 14, 2020 •

edited

Loading

copernico commented Oct 7, 2020 •

edited

Loading

pombredanne commented Oct 13, 2020 •

edited

Loading

pombredanne commented Oct 13, 2020 •

edited

Loading

sbs2001 commented Oct 13, 2020 •

edited

Loading

pombredanne commented Oct 13, 2020 •

edited

Loading

copernico commented Oct 13, 2020 •

edited

Loading

pombredanne commented Aug 24, 2021 •

edited

Loading