-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend: Fix duplicate reference urls(#343) #408
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,5 @@ zipp==0.6.0 | |
requests==2.23.0 | ||
toml==0.10.2 | ||
PyYAML==5.4 | ||
freezegun==1.1.0 | ||
freezegun==1.1.0 | ||
urlpy==0.5 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ | |
import importlib | ||
from datetime import datetime | ||
from time import sleep | ||
import urlpy | ||
|
||
from django.db import models | ||
from django.db import IntegrityError | ||
|
@@ -110,6 +111,38 @@ class VulnerabilityReference(models.Model): | |
def scores(self): | ||
return VulnerabilitySeverity.objects.filter(reference=self.id) | ||
|
||
def save(self, *args, **kwargs): | ||
if self.id: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
else: | ||
url_parsed = urlpy.parse(self.url) | ||
self.url = str(url_parsed.canonical()) | ||
Comment on lines
+118
to
+119
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this handle the case where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the code. It returned "/" for the same but keeping "" for "" would make more sense so updated the changes. |
||
url_scheme = url_parsed.scheme | ||
scheme_independent_url = self.url[len(url_scheme) :] | ||
if url_scheme == "http": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This block should be broken down into a separate function. |
||
similar_instance = VulnerabilityReference.objects.filter( | ||
vulnerability=self.vulnerability, | ||
source=self.source, | ||
reference_id=self.reference_id, | ||
url="https" + scheme_independent_url, | ||
).first() | ||
if not similar_instance: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
elif url_scheme == "https": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, This block should be broken down into a separate function. |
||
similar_instance = VulnerabilityReference.objects.filter( | ||
vulnerability=self.vulnerability, | ||
source=self.source, | ||
reference_id=self.reference_id, | ||
url="http" + scheme_independent_url, | ||
).first() | ||
if similar_instance: | ||
similar_instance.url = self.url | ||
similar_instance.save() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
else: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use a |
||
else: | ||
super(VulnerabilityReference, self).save(*args, **kwargs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, for the computers it doesn't make difference, but when reading I don't have to verify whether the return value is correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@sbs2001 So, a nice idea, Does this mean that the command/job
I think that in this case, each run would cost O(N) queries and would not be automated. But in the save method update, it would cost 1 extra query per update and would be automated. So I think query wise the override save method is more time-efficient and more automated. What would be your suggestion in the case and if we are going with the job idea, does the proposed method would work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We could get clever by running the job after each import, so this is a non-issue, I guess.
Btw I want to remove the source thing when I get time :) it's not populated anywhere and is a bad way to keep track of source(s) . With that out of the way, what would be the use case of (2) ? In almost all cases (1) would be used.
IMHO that can be done Dups are rare anyways so it won't be O(N) .
IMHO that'd be 1 extra query per insert/ ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ohh okay, I get the idea. We are importing all the data and then doing this job. This would result in more efficiency rather than doing operation at each save. So I guess these are the deliverables now,
|
||
|
||
class Meta: | ||
unique_together = ("vulnerability", "source", "reference_id", "url") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean any update would go through straight ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbs2001 Yes. Because whenever we add a new object we might need to update a previously saved object and if we don't do this it might go into an infinite loop.