-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend: Fix duplicate reference urls(#343) #408
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,5 @@ zipp==0.6.0 | |
requests==2.23.0 | ||
toml==0.10.2 | ||
PyYAML==5.4 | ||
freezegun==1.1.0 | ||
freezegun==1.1.0 | ||
urlpy==0.5 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ | |
import importlib | ||
from datetime import datetime | ||
from time import sleep | ||
import urlpy | ||
|
||
from django.db import models | ||
from django.db import IntegrityError | ||
|
@@ -110,6 +111,38 @@ class VulnerabilityReference(models.Model): | |
def scores(self): | ||
return VulnerabilitySeverity.objects.filter(reference=self.id) | ||
|
||
def save(self, *args, **kwargs): | ||
if self.id or not self.url: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
else: | ||
url_parsed = urlpy.parse(self.url) | ||
self.url = str(url_parsed.canonical()) | ||
url_scheme = url_parsed.scheme | ||
scheme_independent_url = self.url[len(url_scheme) :] | ||
if url_scheme == "http": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This block should be broken down into a separate function. |
||
similar_instance = VulnerabilityReference.objects.filter( | ||
vulnerability=self.vulnerability, | ||
source=self.source, | ||
reference_id=self.reference_id, | ||
url="https" + scheme_independent_url, | ||
).first() | ||
if not similar_instance: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
elif url_scheme == "https": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, This block should be broken down into a separate function. |
||
similar_instance = VulnerabilityReference.objects.filter( | ||
vulnerability=self.vulnerability, | ||
source=self.source, | ||
reference_id=self.reference_id, | ||
url="http" + scheme_independent_url, | ||
).first() | ||
if similar_instance: | ||
similar_instance.url = self.url | ||
similar_instance.save() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
else: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use a |
||
else: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, for the computers it doesn't make difference, but when reading I don't have to verify whether the return value is correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@sbs2001 So, a nice idea, Does this mean that the command/job
I think that in this case, each run would cost O(N) queries and would not be automated. But in the save method update, it would cost 1 extra query per update and would be automated. So I think query wise the override save method is more time-efficient and more automated. What would be your suggestion in the case and if we are going with the job idea, does the proposed method would work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We could get clever by running the job after each import, so this is a non-issue, I guess.
Btw I want to remove the source thing when I get time :) it's not populated anywhere and is a bad way to keep track of source(s) . With that out of the way, what would be the use case of (2) ? In almost all cases (1) would be used.
IMHO that can be done Dups are rare anyways so it won't be O(N) .
IMHO that'd be 1 extra query per insert/ ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ohh okay, I get the idea. We are importing all the data and then doing this job. This would result in more efficiency rather than doing operation at each save. So I guess these are the deliverables now,
|
||
|
||
class Meta: | ||
unique_together = ("vulnerability", "source", "reference_id", "url") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this handle the case where
self.url=None
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
self.url=""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code. It returned "/" for the same but keeping "" for "" would make more sense so updated the changes.