Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate import and improve operations #525

Merged
merged 40 commits into from
Jan 25, 2022

Conversation

Hritik14
Copy link
Collaborator

@Hritik14 Hritik14 commented Aug 10, 2021

!! Breaking Changes !!

Entire database model is restructured.

Note: All changes in the importers are only reflected in the nginx importer while this PR is in draft stage.

Knows defects (in current PR):

  • UI break (see above)
  • might crash in multiple imports / improves
  • No improver than default improver is implemented yet
  • normalized function of AdvisoryData has no body
  • nginx importer still has remains of set_api etc
  • Inference -> AdvisoryData encapsulation
  • Duplicated data in database
  • Drop migrations
  • ???

Knows defects (to be solved in different PR):

Signed-off-by: Hritik Vijay [email protected]

@Hritik14 Hritik14 force-pushed the importer-refactor branch 2 times, most recently from 331f70c to 925d0ef Compare August 13, 2021 00:33
@Hritik14
Copy link
Collaborator Author

Hritik14 commented Aug 13, 2021

Feel free to review now. It's not perfect yet, so please consider these points

  • Only the framework is there, the improvers/nginx.py improver doesn't actually improve anything. It needs to be written properly to fit this structure
  • Many many names for patched/fixed/resolved packages are still there, please replace them in your mind, I'll refactor all of these in one commit so as to not pollute current commit with irrelevant diffs
  • There are a couple of fixmes, I'll try to figure them out, feel free to comment on those
  • I have made no change in the importer logic yet (like refactoring importer_yielder, other potential improvements making it consistent with improvers), it is for the same reason that I don't want to pollute the current commit. The current commit only focuses on improvers. I'll add a new commit after refactoring importers explicitly.
  • tl;dr : there's only improver framework (not improver itself), least possible irrelevant code is touched (although it needs to be refactored)

@pombredanne
Copy link
Member

@Hritik14 re:

but then the importers would have to
mention that whatever they import have 100% confidence which might be
susceptible to typo errors making some importers not mention their
confidence thus zeroing on confidence.

If you have a default of 100% this becomes a non-issue? I kinda like to avoid near-duplicated data structures.
In a way it makes sense to have a "100%" confidence in factual data we import... and there could be cases where we do not have 100% confidence in some imported factual data too?

@pombredanne
Copy link
Member

Note that using a common data structure does not mean that we have only importers.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
... see some review comments inline.

@@ -0,0 +1,37 @@
import dataclasses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistence, please add a license header. We still need to update the license everywhere per #277 , but best is to have whatever we have now consistently across all code files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. How about I add all the license in the last commit ? Might make things clearer. Let me know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

vulnerabilities/data_source.py Outdated Show resolved Hide resolved
vulnerabilities/import_runner.py Show resolved Hide resolved
vulnerabilities/import_runner.py Outdated Show resolved Hide resolved
vulnerabilities/import_runner.py Outdated Show resolved Hide resolved
vulnerabilities/improvers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/improvers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/management/commands/improve.py Outdated Show resolved Hide resolved
vulnerabilities/management/commands/improve.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
@Hritik14
Copy link
Collaborator Author

Hritik14 commented Aug 14, 2021

@pombredanne

If you have a default of 100% this becomes a non-issue?

A typo could lead to inferences to have 100% confidence in that case...

I kinda like to avoid near-duplicated data structures

Importers and improvers look similar for now mostly because improvers don't really contain anything yet. There are various subclasess for importers like GitDataSource or OvalDataSource with corresponding helper functions to parse advisory data from an actual data source, whereas in case of improvers, they only get the entries from the existing data thus no advisory parsing would be required. On the other hand, improvers could benefit with functions enabling them to access data from the database in a better manner (than done here, I'll be removing direct database interactions in improvers). Other than that an Inference could be seen as what we actually store in our system.

In a way it makes sense to have a "100%" confidence in factual data we import... and there could be cases where we do not have 100% confidence in some imported factual data too

If it's a factual data then then by definition it's a fact thus 100% confidence, right?

Note that using a common data structure does not mean that we have only importers.

Inserting into the database (be it importer or improver) is essentially the same in our structure. Wouldn't we finally need a common data structure anyway ?

vulnerabilities/improvers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/improvers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/improvers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
@pombredanne pombredanne changed the title Seperate import and improve operations Separate import and improve operations Aug 17, 2021
@Hritik14
Copy link
Collaborator Author

Hritik14 commented Aug 28, 2021

Current commit brings breaking changes. Feel free to review but do ignore the TODO's and pointers mentioned in the latest commit message.

This is a step towards https://github.com/nexB/vulnerablecode/wiki/WeeklyMeetings#meeting-on-tuesday-2021-08-17-at-1400-utc

@Hritik14
Copy link
Collaborator Author

Hritik14 commented Aug 29, 2021

Trying out adding a real improver other than the default one now. It is not ready yet. Suggestions are welcome for the WIP commit but please avoid it in any strict review currently.
Latest reviewable commit: 38f5872

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using to_dict rather than directly to_json ... a to_json becomes then a thin wrapper on the serialization with to_dict
See also my other comments inline.

@@ -0,0 +1,37 @@
import dataclasses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

vulnerabilities/data_inference.py Show resolved Hide resolved
vulnerabilities/data_inference.py Outdated Show resolved Hide resolved
vulnerabilities/data_source.py Show resolved Hide resolved
vulnerabilities/data_inference.py Outdated Show resolved Hide resolved
vulnerabilities/views.py Show resolved Hide resolved
vulnerabilities/data_source.py Outdated Show resolved Hide resolved
vulnerabilities/data_source.py Outdated Show resolved Hide resolved
vulnerabilities/importers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/importers/nginx.py Outdated Show resolved Hide resolved
Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Sep 12, 2021
The refactors are based on aboutcode-org#525 (review)

- class Inference: order should be the logic order from most important to least important fields
- class Improver: docs
- data_source.py: Use to_dict() than json()
- models: Advisory: docs
- improvers/nginx.py: Removed, relevant improvers will now reside inside
  importer module. In this case, importers/nginx.py
- black formatting

Signed-off-by: Hritik Vijay <[email protected]>
@Hritik14
Copy link
Collaborator Author

Using the OSV design has some caveats. Given data like:

Not vulnerable: 1.21.0+, 1.20.1+
Vulnerable: 0.6.18-1.20.0

It is impossible to directly get one fixed version for a range of affected versions.

@pombredanne
Copy link
Member

@Hritik14 re:

Using the OSV design has some caveats. Given data like:

Not vulnerable: 1.21.0+, 1.20.1+
Vulnerable: 0.6.18-1.20.0

It is impossible to directly get one fixed version for a range of affected versions.

Is this a real case rather than some made up example?
I cannot fathom how you could get the two versions fixing the same bug.
In the example you gave, 1.21.0+ is impossible and 1.20.1+ can only be the correct one as this is the latest one.

Now things would be different if there were more than one range of course... but then that would be something entirely different?

@pombredanne
Copy link
Member

Here is the gist of our discussion:

  1. nginx advisories are weird at best and likely not meant to be strictly parsable.
  2. the corresponding CVEs CPE ranges seem to be also problematic in some cases
  3. we should reach out to nginx to clarify
  4. in the meantime and in general, where there are ambiguities like this we
    4.1. make a safe judgment call in this case use the latest versions of multiple "fixed" version that apply to the same range. We may be overly cautious and there could be some configs that are not vulnerable that we may report as vulnerable in some cases
    4.2. log this and flag these for manual review

@Hritik14
Copy link
Collaborator Author

Turns out we can parse nginx versions smoothly. Here's more on it #553

@pombredanne
Copy link
Member

I do not like the batch processing of multiple Advisory at once in a single transaction... this may be an optimization we will need in the future, but not now.

Here is a suggested updated way:

  1. each Improver has a method to process a single Advisory model instance such as Improver.get_inferences(self, advisory): -> Inference or something like that.

  2. the framework then iterates on an Improver-provided query set such as Improver.get_interesting_advisories(self): -> QuerySet that has the Advisories it is interested in.

  3. in the framework, there is an atomic transaction that updates both the Advisory (e.g. date and later a log of improvements with select for update) and whatever is updated or create from the Inference

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Oct 12, 2021
The refactors are based on aboutcode-org#525 (review)

- class Inference: order should be the logic order from most important to least important fields
- class Improver: docs
- data_source.py: Use to_dict() than json()
- models: Advisory: docs
- improvers/nginx.py: Removed, relevant improvers will now reside inside
  importer module. In this case, importers/nginx.py
- black formatting

Signed-off-by: Hritik Vijay <[email protected]>
@Hritik14 Hritik14 force-pushed the importer-refactor branch 2 times, most recently from d2a2167 to 56866f7 Compare October 12, 2021 21:11
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge! ... see a few nits for your review.

vulnerabilities/data_source.py Outdated Show resolved Hide resolved
vulnerabilities/data_source.py Outdated Show resolved Hide resolved
affected_version_range = VersionRange.from_string(affected_pkg["affected_version_range"])
fixed_version = affected_pkg["fixed_version"]
if fixed_version:
# TODO: revisit after https://github.com/nexB/univers/issues/10
Copy link
Member

@pombredanne pombredanne Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now.

H: No..... aboutcode-org/univers#10 (comment)

vulnerabilities/importers/nginx.py Outdated Show resolved Hide resolved
vulnerabilities/importers/nginx.py Show resolved Hide resolved
vulnerabilities/views.py Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
vulnerabilities/models.py Outdated Show resolved Hide resolved
Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Jan 25, 2022
Fixes are in accordance with the final review for this PR done at :
aboutcode-org#525 (review)

Signed-off-by: Hritik Vijay <[email protected]>
Co-authored-by: Philippe Ombredanne <[email protected]>
Hritik14 and others added 3 commits January 26, 2022 03:13
from_dict is a factory, should be a classmethod

Signed-off-by: Hritik Vijay <[email protected]>
The entire view is messed up. Will probably need a rewrite

Co-authored-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
Fixes are in accordance with the final review for this PR done at :
aboutcode-org#525 (review)

Signed-off-by: Hritik Vijay <[email protected]>
Co-authored-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne pombredanne merged commit 67fc00e into aboutcode-org:main Jan 25, 2022
@Hritik14 Hritik14 deleted the importer-refactor branch January 25, 2022 22:11
@Hritik14 Hritik14 linked an issue Jan 26, 2022 that may be closed by this pull request
@Hritik14 Hritik14 mentioned this pull request Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants