-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect Mozilla #393
Collect Mozilla #393
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hritik14 I've done a light review for now, will do another review once changes are made.
@sbs2001 any update ? |
@Hritik14 are you sure, you pushed the changes ? |
@sbs2001 ah, not yet. I actually replied on the requested changes. I'm not sure if you get notifications for those replies. Do let me know and I'll leave a comment next time as well. |
@sbs2001 I've refactored as suggested and also added a helper functions to split markdown and the front matter. I'd add it to the istio importer after this PR is merged. |
3e2c0f8
to
a95a0e9
Compare
rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hritik14 I've left some comments inline for your consideration. This looks good btw .
vulnerabilities/importers/mozilla.py
Outdated
# FIXME: Needs improvement | ||
# Should we add 'bugs' section in references too? | ||
# Should we add 'impact'/severity of CVE in references too? | ||
# If yes, then fix alpine_linux importer as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong at alpine linux ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the severity of the CVE itself ?
Severity is only available in references, so maybe we should consider adding the CVE Mitre as a reference (as done in this PR) more uniformly.
In Alpine, the CVE is not added as a reference but only as id. It won't really be required there as no severity is mentioned in the advisory for alpine, I wrote it just to maintain uniformaty.
This is what I was thinking
https://github.com/nexB/vulnerablecode/blob/a95a0e9c21ba9a2689dffe46c6c55fe7b8dacf78/vulnerabilities/importers/mozilla.py#L150
vulnerabilities/importers/mozilla.py
Outdated
p = "" | ||
if h3tag: | ||
for tag in h3tag.next_siblings: | ||
if tag.name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, could you point me to an example where this is not true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a precautionary measure. I'm only collecting if a p
tag follows an h3
which seems more or less like a standard. There is no strict rule about how to format the markdown file, so I figured we should get the correct data if at all.
vulnerabilities/importers/mozilla.py
Outdated
if tag.name: | ||
if tag.name != "p": | ||
break | ||
p += tag.get_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that collect the strong
tag too ? Like the one at https://github.com/mozilla/foundation-security-advisories/blob/master/announce/2016/mfsa2016-14.md ? Not a big deal but we should avoid collecting those tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
For the linked document it returns
Security researcher Holger Fuhrmannek reported that a malicious\nGraphite....
Only consecutive p
tags are collected and get_text() makes sure we're getting only textual representation of the content present in p
tags.
Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>
a95a0e9
to
1f2be03
Compare
@sbs2001 any update ? |
Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>
d8fab24
to
0983662
Compare
Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>
0983662
to
b0d39b2
Compare
This is based on aboutcode-org#78. Most of the work is done. Here's the checklist - [x] Parsing for mozilla data source - [x] Working mozilla importer - [ ] Migrate to GitDataSource - [ ] Solve TODOs - [ ] Write test cases Currently, it requires two extra dependencies: PyGithub and markdown I would eliminate the PyGithub dependency next but I think markdown has to stay. Final dependences would be comitted later. Signed-off-by: Hritik Vijay <[email protected]>
[x] Migrate to GitDataSource [x] Update dependencies [x] More verbose comments Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
Use batch_advisories for now. It has it's own problems and there's aboutcode-org#338 for that. The generator thing won't do much, since we are importing like 10-20 MBs of data. The codebase already has overuse of methods starting with _ , I'd say avoid them. They don't help with readability nor are they trivial in this case _parse_md and _parse_yml: The name is misleading. The function is parsing + enriching the data. converted to get_advisories_from_yml and get_advisories_from_md Remove `"branch": None` in importer_yielder Group imports Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
spaces are important, otherwise it would fail to produce a valid yaml front matter in case of https://raw.githubusercontent.com/mozilla/foundation-security-advisories/master/announce/2012/mfsa2012-85.md Anyway, line shouldn't be altered in a splitter. Signed-off-by: Hritik Vijay <[email protected]>
Mozilla website uses rsplit to extract the name and version so it should be better in any case. https://github.com/mozilla/bedrock/blob/765a60450235d810cf941676e4a29f012a9eaaba/bedrock/security/models.py#L29 Based on discussion here mozilla/foundation-security-advisories#76 Signed-off-by: Hritik Vijay <[email protected]>
Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
b0d39b2
to
c60ab41
Compare
I would really write a test case for this someday too Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
see a few comments inline for you consideration.
Also, can you avoid having a test zip in vulnerabilities/tests/test_data/mozilla.zip ? text is fine instead.
This will simplify the tests and help review too.
mdlines = [] | ||
splitter = mdlines | ||
|
||
for index, line in enumerate(lines.split("\n")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this instead?
import saneyaml
# normalize line endings just in case:
text = text.replace("\r\n", "\n")
front_matter, _, body = text.rpartition("\n---\n")
front_matter = saneyaml.load(front_matter)
For instance:
>>> import saneyaml
>>> text = """---
... title: ISTIO-SECURITY-2019-001
... subtitle: 安全公告
... description: 错误的权限控制。
... cve: [CVE-2019-12243]
... publishdate: 2019-05-28
... keywords: [CVE]
... skip_seealso: true
... aliases:
... - /zh/blog/2019/cve-2019-12243
... - /zh/news/2019/cve-2019-12243
... ---
...
... {{< security_bulletin
... cves="CVE-2019-12243"
... cvss="8.9"
... vector="CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:N/E:H/RL:O/RC:C"
... releases="1.1 to 1.1.6" >}}
...
... ## 内容{#context}
... """
>>> text = text.replace("\r\n", "\n")
>>> front_matter, _, body = text.rpartition("\n---\n")
>>> front_matter = saneyaml.load(front_matter)
>>> print(front_matter)
{'title': 'ISTIO-SECURITY-2019-001', 'subtitle': '安全公告', 'description': '错误的权限控制。', 'cve': ['CVE-2019-12243'], 'publishda
te': '2019-05-28', 'keywords': ['CVE'], 'skip_seealso': True, 'aliases': ['/zh/blog/2019/cve-2019-12243', '/zh/news/2019/cve-2019-122
43']}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not been using saneyaml
anywhere in the project. If that is preferred, I can add it to requirements and we can start using that but I don't see any specific reason to do so here.
Regarding rpartition
. We cannot have that either as a markdown is allowed to have a "\n---"\n
.
Consider this
>>> text=""""---
... announced: February 4, 2014
... fixed_in:
... - Firefox 27
... - Firefox ESR 24.3
... - Thunderbird 24.3
... - Seamonkey 2.24
... impact: High
... reporter: Cody Crews
... title: Clone protected content with XBL scopes
... ---
...
... <h3>Description</h3>
...
... Text
... ---
... other text
... """
>>> text = text.replace("\r\n", "\n")
>>> front_matter, _, body = text.rpartition("\n---\n")
>>>
>>> front_matter
'"---\nannounced: February 4, 2014\nfixed_in:\n- Firefox 27\n- Firefox ESR 24.3\n- Thunderbird 24.3\n- Seamonkey 2.24\nimpact: High\nreporter: Cody Crews\ntitle: Clone protected content with XBL scopes\n---\n\n<h3>Description</h3>\n\nText'
>>>
>>> body
'other text\n'
>>>
The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208
Further, as it is a helper, we cannot have it mozilla specific. All it should do is to split the front matter and markdown.
I could have something like this
import yaml
text="""---
announced: February 4, 2014
fixed_in:
- Firefox 27
- Firefox ESR 24.3
- Thunderbird 24.3
- Seamonkey 2.24
impact: High
reporter: Cody Crews
title: Clone protected content with XBL scopes
---
<h3>Description</h3>
---
other text
"""
# normalize line endings just in case:
text = text.replace("\r\n", "\n")
linezero,_, text = text.partition("---\n")
if not linezero: # nothing before first ---
front_matter,_, body = text.partition("---")
front_matter = yaml.safe_load(front_matter)
else:
front_matter = ""
body = linezero + "---\n" + text
print(front_matter)
print(body)
which prints
{'announced': 'February 4, 2014', 'fixed_in': ['Firefox 27', 'Firefox ESR 24.3', 'Thunderbird 24.3', 'Seamonkey 2.24'], 'impact': 'High', 'reporter': 'Cody Crews', 'title': 'Clone protected content with XBL scopes'}
<h3>Description</h3>
---
other text
but doesn't look any better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but doesn't look any better to me.
It feels a bit more readable may be?
The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208
We could very much reuse it as-is as well. This is under an MPL license so we would have to do it in an orderly fashion with proper license tracking and code separation though.
If we do not copy it, we cannot reuse code from it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is required by multiple importers, let's move this to it's own PR. Opened #443
Advisory( | ||
summary=description, | ||
vulnerability_id="", | ||
impacted_package_urls=[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can you get no impacted packages and fixed packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream doesn't provide with a list of impacted package. We might consider all packages until this package as impacted but I'm not sure about this. There are many other importers that do this too. Eg: https://github.com/nexB/vulnerablecode/blob/f254b0d4ac54b70c648055a7e8eda16c05dce0f9/vulnerabilities/importers/alpine_linux.py#L194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. We need a ticket though, as this may need to be interpreted as "all versions before this version are vulnerable" and it warrant some research and discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, is this going to be redundant now, just like the suse_backport importer. We now have a improved notion of fixed/resolved/patched package now. Due to #436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to merge this sooner than later. ... See also #449 (comment)
@pombredanne Thank you for the through review. Please check unresolved conversations and I'll push the updated code. |
@Hritik14 done :) Thank you for having looked at all these in details! |
Provide a helper for uniform cve search in importers. Based on aboutcode-org#393 (comment) Signed-off-by: Hritik Vijay <[email protected]>
Provide a helper for uniform cve search in importers. Based on aboutcode-org#393 (comment) Signed-off-by: Hritik Vijay <[email protected]>
@pombredanne re zip files: see #442 (comment) . @Hritik14 IMHO mocking the whole GItDataSource's logic of collecting changes is redundant, since there are thorough tests of that already. There is some mock db things which IMHO is also not required . Instead you could just test the Something like https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/tests/test_retiredotnet.py . There are tests which do check what ends up in db, but I think those should be limited and ideally should write small amounts of data. |
@Hritik14 do you mind to rebase? |
6d372dc
to
d192798
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this first and fix it later.
Please enter an issue to track that
We are merging first and will fix issues afterward.
Signed-off-by: Hritik Vijay <[email protected]>
Signed-off-by: Hritik Vijay <[email protected]>
d192798
to
e6d652c
Compare
Fixes #78.
Most ofthe work is done. Here's the checklistCurrently, it requires
twoextra dependencies:PyGithuband markdownI would eliminate the PyGithub dependency next butI think markdown hasto stay.
Final dependencies would be comitted later.I would also really like advices on
This is how it looks for now
Signed-off-by: Hritik Vijay [email protected]