Change data models, to fix existing issues #206

sbs2001 · 2020-06-20T08:09:17Z

Vulnerability model diff

Old Vulnerability model

class Vulnerability(models.Model):
   cve_id = models.CharField(max_length=50, help_text='CVE ID', unique=True, null=True)
   summary = models.TextField(help_text='Summary of the vulnerability', blank=True)
   cvss = models.FloatField(max_length=100, help_text='CVSS Score', null=True)

New Vulnerability model

In the new Vulnerability model, cvss which is a severity indicator, is removed, the reason being, every data source rates the same vulnerability differently, hence we need some additional context when talking about severity. Hence all severity related fields are moved to VulnerabilityReference.

I also felt moving reference_ids here makes more sense. We would want a way to avoid entries duplicate entries in reference_id which I guess is possible, check https://stackoverflow.com/questions/49252135/how-to-save-arrayfield-as-set-in-django

class Vulnerability(models.Model):
   vuln_id = models.CharField(max_length=50, help_text='CVE ID', unique=True, null=True)
   reference_ids = pgfields.ArrayField(
       models.CharField(max_length=50,help_text='Reference ID, eg:DSA-4465-1'),default=list)

ImpactedPackage

Note: We would delete ResolvedPackage and rename ImpactedPackage to something else. Since their exact funtionality can be mimicked by a more efficient boolean field. This will also reduce the number of db queries

Old ImpactedPackage

class ImpactedPackage(models.Model):
   vulnerability = models.ForeignKey(Vulnerability, on_delete=models.CASCADE)
   package = models.ForeignKey(Package, on_delete=models.CASCADE)

   class Meta:
       unique_together = ('vulnerability', 'package')

New ImpactedPackage

Note: This won't be called ImpactedPackage, please suggest a good name.

The version_range will contain mathematical comparator symbols like '<,>,=' , (This will also support complex ranges like >1.0.0, <9.8.10 basically anything that dephell_range_specifier.RangeSpecifier supports) . This is very useful in cases where the data sources don't provide concrete versions. In the cases where concrete versions are provided we can always have version_range= '=' .

I haven't figured out how to solve #141, need help . Maybe adding Importer FK might help.

class ImpactedPackage(models.Model):
   vulnerability = models.ForeignKey(Vulnerability, on_delete=models.CASCADE)
   package = models.ForeignKey(Package, on_delete=models.CASCADE)
   version_range = models.CharField(max_length=30,default='=')
   is_vulnerable = models.BooleanField()

   class Meta:
       unique_together = ('vulnerability', 'package', 'version_range')

VulnerabilityReference

Old VulnerabilityReference

class VulnerabilityReference(models.Model):
   vulnerability = models.ForeignKey(
       Vulnerability, on_delete=models.CASCADE)
   source = models.CharField(
       max_length=50, help_text='Source(s) name eg:NVD', blank=True)
   reference_id = models.CharField(
       max_length=50, help_text='Reference ID, eg:DSA-4465-1', blank=True)
   url = models.URLField(
       max_length=1024, help_text='URL of Vulnerability data', blank=True)

   class Meta:
       unique_together = ('vulnerability', 'source', 'reference_id', 'url')

New VulnerabilityReference

Lots of changes here. We have here a FK of Importer, which makes logical sense since different data sources will say different things about the same vulnerability(hence the new unique_together)
Other changes include addtion of more severity indicators and using pgfields.ArrayField to store urls because we don't want to create a VulnerabilityReference for each url with other fields having same data.

Updated

class VulnerabilityReference(models.Model):
   vulnerability = models.ForeignKey(
       Vulnerability, on_delete=models.CASCADE,required=True)
   source = models.ForeignKey(
       Importer, on_delete=models.CASCADE, required=True)
   urls = pgfields.ArrayField(models.URLField(
       max_length=1024, help_text='URL of Vulnerability data'),default=list)
   summary = models.TextField(help_text='Summary of the vulnerability', blank=True)
   class Meta:
       unique_together = ('vulnerability', 'source')

class VulnerabilityScore(models.Model):
 vulnerability_reference = models.ForeignKey(VulnerabilityReference, on_delete=models.CASCADE, required=True)
 type = models.CharField(max_length=50, help_text='Vulnerability score type', blank=True))
 score = models.FloatField()
 severity_text = models.CharField()

Gets us closer to fix #18, #157, #140, #209 and allows us to collect data from any advisories which don't provide concrete versions(but provide version ranges)

The text was updated successfully, but these errors were encountered:

sbs2001 · 2020-06-20T11:00:14Z

I just watched few talks about database indexing and django, looking back at these models, they are not perfect considering the indexing(order in unique_together matters ;) which I didnt think of earlier ) .

pombredanne · 2020-06-20T15:53:05Z

For VulnerabilityReference, IMHO each URL is a different URL... I do not see the value to use an arrayfield there. That's OK to have the same data with different URLs.

And what about that :

class VulnerabilityScore(models.Model):
   vulnerability_reference = models.ForeignKey(VulnerabilityReference, on_delete=models.CASCADE, required=True)
   type = models.CharField(max_length=50, help_text='Vulnerability score type', blank=True))
   score = models.CharField(max_length=50, help_text='Score value', blank=True))

and type could be any of the various cvss, a severity, or "other"

sbs2001 · 2020-06-20T16:19:42Z

Last time I checked https://github.com/intel/cve-bin-tool , they also used such a model with minor differences. I like the type field which future proofs the db for all other standards.

The only issues I have with this is score = models.CharField(max_length=50, help_text='Score value', blank=True)) , which kind of prevents the use case where the user wants to view vulnerabilities with cvssX > Y . Adding to your model IMHO I like this better

class VulnerabilityScore(models.Model):
  vulnerability_reference = models.ForeignKey(VulnerabilityReference, on_delete=models.CASCADE, required=True)
  type = models.CharField(max_length=50, help_text='Vulnerability score type', blank=True))
  score = models.FloatField()
  severity_text = models.CharField()

We could also get more sophisticated by populating the severity_text by computing the score, but let's get the big things right :)

Last time we discussed about this issue, you also mentioned that we could get sophisticated by storing timestamps of when the scores were given, which is now possible thanks to @haikoschol 's Importer model. If we want timestamps using a separate VulnerabilityScore ,will add up one more sql JOIN IMHO, but that's alright.

sbs2001 · 2020-06-20T16:26:17Z

@pombredanne re

For VulnerabilityReference, IMHO each URL is a different URL... I do not see the value to use an arrayfield there. That's OK to have the same data with different URLs.

The rationale behind using arrayfield is I want to have unique_together = ('vulnerability', 'source') . One reference to look at to know what XYZ data_source thinks about it ;) . Also isn't preventing duplicate data considered a win , I wouldn't say arrayfield being useless here

haikoschol · 2020-06-22T13:29:54Z

Regarding the proposed Vulnerability model:

I would prefer not using blank=True and null=True for text columns, unless there really is a semantic difference between empty string and NULL.
In theory it is possible that two data sources use the same reference ID for two different vulnerabilities that both don't have a CVE ID. I don't see how we could detect that and avoid updating the vulnerability that was imported first, instead of adding a new one. I have no idea whether that is likely to happen though.

Regarding the proposed ImpactedPackage model:

IIRC, one issue that came up when implementing the new import mechanism was related to cascading deletes in ImpactedPackage and ResolvedPackage; When an advisory gets updated and a version that was previously considered fixed is vulnerable. This should be possible with the proposed changes by simply updating the is_vulnerable flag. (BTW, I don't have a good name for the new ImpactedPackage model.)

Regarding #141:

I'm not sure I understand the issue. What would be a concrete scenario that triggers it? Is it something like this:

new vulnerability V is imported, references package P as vulnerable in versions >1.0,<2.0
before the next import run for this data source, the version range for V is updated to >1.1,<2.0

If the data source supports detecting updates to existing advisories, it should work fine. If the data source does not support it, there is nothing we can do automatically, so have to rely on curations.

sbs2001 · 2020-06-24T11:17:36Z

I would prefer not using blank=True and null=True for text columns, unless there really is a semantic difference between empty string and NULL.

Sure I didn't give it much thought , I want to have the big things right first.

I'm not sure I understand the issue.

The scenario described is perfect. Rather it actually happened a while ago when you discovered that wrong version range in safety-db

sbs2001 · 2020-06-25T07:13:23Z

@haikoschol

I would prefer not using blank=True and null=True for text columns, unless there really is a semantic difference between empty string and NULL.

I remember now, a while back I had posted something like this in chat

This doesn't error out , but it should

In [2]: p1 = Package(name='foo',version='1.0.0',type='deb')                                                                                                                         

In [3]: p2 = Package(name='foo',version='1.0.0',type='deb')                                                                                                                         

In [4]: p1.save()                                                                                                                                                                   

In [5]: p2.save()

This errors out fine,

In [9]: p2 = Package(name='foo',version='1.0.0',type='deb',qualifiers={'os':'windows'},namespace='',subpath='')                                                                     

In [10]: p3 = Package(name='foo',version='1.0.0',type='deb',qualifiers={'os':'windows'},namespace='',subpath='')                                                                    

In [11]: p3.save()                                                                                                                                                                  

In [13]: p2.save()

So in the latter case, we were storing empty string in db, while the first case stored NULL in db. The first case creates duplicates. IMHO empty string seems better in this Package context.

haikoschol · 2020-06-25T08:34:00Z

This doesn't error out , but it should

In [2]: p1 = Package(name='foo',version='1.0.0',type='deb')                                                                                                                         

In [3]: p2 = Package(name='foo',version='1.0.0',type='deb')                                                                                                                         

In [4]: p1.save()                                                                                                                                                                   

In [5]: p2.save()

This errors out fine,

In [9]: p2 = Package(name='foo',version='1.0.0',type='deb',qualifiers={'os':'windows'},namespace='',subpath='')                                                                     

In [10]: p3 = Package(name='foo',version='1.0.0',type='deb',qualifiers={'os':'windows'},namespace='',subpath='')                                                                    

In [11]: p3.save()                                                                                                                                                                  

In [13]: p2.save()

So in the latter case, we were storing empty string in db, while the first case stored NULL in db. The first case creates duplicates. IMHO empty string seems better in this Package context.

Ah, that's the reason for this behaviour. Makes sense and yet another reason to avoid blank=True, null=True. :)

pombredanne · 2020-06-25T09:47:22Z

So in the latter case, we were storing empty string in db, while the first case stored NULL in db. The first case creates duplicates. IMHO empty string seems better in this Package context.

we have to be consistent in all cases... @tdruez what's your take on the blank/null thing on Django fields?

pombredanne · 2020-06-25T09:53:36Z

Here are my notes what our live discussion today:

It would be best to have all models changes in a PR (it is OK that nothing will work and test fails, etc.) this is just to better view the changes and comment on them more efficiently
the proposed changes for ImpactedPackage seem fine (minor adjustments may be needed and feedback in the PR will help)
for the Vulnerability/VulnerabiltyReference changes, that's rather major and there are possibly multiple ways to hand these. For now, the PR is going to the best way.
It may be better to use a JSONField than an ArrayField as it is more portable to other DBs: an array is really just a JSON list?
a small diagram of the proposed new models and their relationship would help a lot, but that's secondary
storing the commit as references is fine IMHO, we should agree on a convention on how this fits. I think using the SPDX VCS URL as the URL in a reference may be enough?

tdruez · 2020-06-25T13:16:07Z

@pombredanne https://docs.djangoproject.com/en/3.0/ref/models/fields/#null

Avoid using null on string-based fields such as CharField and TextField. If a string-based field has null=True, that means it has two possible values for “no data”: NULL, and the empty string. In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL.

sbs2001 · 2021-02-23T12:48:38Z

Fixed finally via #259

sbs2001 mentioned this issue Jun 20, 2020

Change database schema to accommodate different types of severity indicators #157

Closed

sbs2001 added Core models enhancement Priority: high labels Jun 20, 2020

sbs2001 mentioned this issue Jul 29, 2020

Remove the null=True on CharField fields of the PackageURLMixin #36 package-url/packageurl-python#40

Merged

sbs2001 closed this as completed Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change data models, to fix existing issues #206

Change data models, to fix existing issues #206

sbs2001 commented Jun 20, 2020 •

edited

Loading

sbs2001 commented Jun 20, 2020

pombredanne commented Jun 20, 2020 •

edited

Loading

sbs2001 commented Jun 20, 2020

sbs2001 commented Jun 20, 2020 •

edited

Loading

haikoschol commented Jun 22, 2020

sbs2001 commented Jun 24, 2020

sbs2001 commented Jun 25, 2020

haikoschol commented Jun 25, 2020

pombredanne commented Jun 25, 2020

pombredanne commented Jun 25, 2020

tdruez commented Jun 25, 2020

sbs2001 commented Feb 23, 2021

Change data models, to fix existing issues #206

Change data models, to fix existing issues #206

Comments

sbs2001 commented Jun 20, 2020 • edited Loading

Vulnerability model diff

Old Vulnerability model

New Vulnerability model

ImpactedPackage

Old ImpactedPackage

New ImpactedPackage

VulnerabilityReference

Old VulnerabilityReference

New VulnerabilityReference

sbs2001 commented Jun 20, 2020

pombredanne commented Jun 20, 2020 • edited Loading

sbs2001 commented Jun 20, 2020

sbs2001 commented Jun 20, 2020 • edited Loading

haikoschol commented Jun 22, 2020

sbs2001 commented Jun 24, 2020

sbs2001 commented Jun 25, 2020

haikoschol commented Jun 25, 2020

pombredanne commented Jun 25, 2020

pombredanne commented Jun 25, 2020

tdruez commented Jun 25, 2020

sbs2001 commented Feb 23, 2021

sbs2001 commented Jun 20, 2020 •

edited

Loading

pombredanne commented Jun 20, 2020 •

edited

Loading

sbs2001 commented Jun 20, 2020 •

edited

Loading