Collect Mozilla #393

Hritik14 · 2021-03-20T00:28:55Z

Fixes #78. ~~Most of~~ the work is done. Here's the checklist

Currently, it requires ~~two~~ extra dependencies: ~~PyGithub~~ and markdown
~~I would eliminate the PyGithub dependency next but~~ I think markdown has
to stay.
~~Final dependencies would be comitted later.~~

I would also really like advices on

The FIXMEs present in current code.
Should we mark all the pks <= resolved pkg as vulnerable as mozilla doesn't ship that information

This is how it looks for now

Signed-off-by: Hritik Vijay [email protected]

sbs2001

@Hritik14 I've done a light review for now, will do another review once changes are made.

vulnerabilities/importers/mozilla.py

vulnerabilities/importer_yielder.py

vulnerabilities/importers/mozilla.py

Hritik14 · 2021-03-24T09:48:01Z

@sbs2001 any update ?

sbs2001 · 2021-03-25T04:12:45Z

@Hritik14 are you sure, you pushed the changes ?

Hritik14 · 2021-03-25T09:06:00Z

@sbs2001 ah, not yet. I actually replied on the requested changes. I'm not sure if you get notifications for those replies. Do let me know and I'll leave a comment next time as well.

Hritik14 · 2021-03-25T22:58:49Z

@sbs2001 I've refactored as suggested and also added a helper functions to split markdown and the front matter. I'd add it to the istio importer after this PR is merged.

Hritik14 · 2021-03-29T05:57:55Z

rebased

sbs2001

Thanks @Hritik14 I've left some comments inline for your consideration. This looks good btw .

vulnerabilities/importers/mozilla.py

sbs2001 · 2021-03-30T10:13:13Z

vulnerabilities/importers/mozilla.py

+        # FIXME: Needs improvement
+        # Should we add 'bugs' section in references too?
+        # Should we add 'impact'/severity of CVE in references too?
+        # If yes, then fix alpine_linux importer as well


What's wrong at alpine linux ?

Should we add the severity of the CVE itself ?
Severity is only available in references, so maybe we should consider adding the CVE Mitre as a reference (as done in this PR) more uniformly.
In Alpine, the CVE is not added as a reference but only as id. It won't really be required there as no severity is mentioned in the advisory for alpine, I wrote it just to maintain uniformaty.
This is what I was thinking
https://github.com/nexB/vulnerablecode/blob/a95a0e9c21ba9a2689dffe46c6c55fe7b8dacf78/vulnerabilities/importers/mozilla.py#L150

vulnerabilities/importers/mozilla.py

sbs2001 · 2021-03-30T10:37:03Z

vulnerabilities/importers/mozilla.py

+        p = ""
+        if h3tag:
+            for tag in h3tag.next_siblings:
+                if tag.name:


Just curious, could you point me to an example where this is not true ?

This is a precautionary measure. I'm only collecting if a p tag follows an h3 which seems more or less like a standard. There is no strict rule about how to format the markdown file, so I figured we should get the correct data if at all.

sbs2001 · 2021-03-30T10:38:35Z

vulnerabilities/importers/mozilla.py

+                if tag.name:
+                    if tag.name != "p":
+                        break
+                    p += tag.get_text()


Would that collect the strong tag too ? Like the one at https://github.com/mozilla/foundation-security-advisories/blob/master/announce/2016/mfsa2016-14.md ? Not a big deal but we should avoid collecting those tags.

No.
For the linked document it returns

Security researcher Holger Fuhrmannek reported that a malicious\nGraphite....

Only consecutive p tags are collected and get_text() makes sure we're getting only textual representation of the content present in p tags.

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 · 2021-04-01T10:54:36Z

@sbs2001 any update ?

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

This is based on aboutcode-org#78. Most of the work is done. Here's the checklist - [x] Parsing for mozilla data source - [x] Working mozilla importer - [ ] Migrate to GitDataSource - [ ] Solve TODOs - [ ] Write test cases Currently, it requires two extra dependencies: PyGithub and markdown I would eliminate the PyGithub dependency next but I think markdown has to stay. Final dependences would be comitted later. Signed-off-by: Hritik Vijay <[email protected]>

[x] Migrate to GitDataSource [x] Update dependencies [x] More verbose comments Signed-off-by: Hritik Vijay <[email protected]>

Signed-off-by: Hritik Vijay <[email protected]>

Use batch_advisories for now. It has it's own problems and there's aboutcode-org#338 for that. The generator thing won't do much, since we are importing like 10-20 MBs of data. The codebase already has overuse of methods starting with _ , I'd say avoid them. They don't help with readability nor are they trivial in this case _parse_md and _parse_yml: The name is misleading. The function is parsing + enriching the data. converted to get_advisories_from_yml and get_advisories_from_md Remove `"branch": None` in importer_yielder Group imports Signed-off-by: Hritik Vijay <[email protected]>

Signed-off-by: Hritik Vijay <[email protected]>

spaces are important, otherwise it would fail to produce a valid yaml front matter in case of https://raw.githubusercontent.com/mozilla/foundation-security-advisories/master/announce/2012/mfsa2012-85.md Anyway, line shouldn't be altered in a splitter. Signed-off-by: Hritik Vijay <[email protected]>

Mozilla website uses rsplit to extract the name and version so it should be better in any case. https://github.com/mozilla/bedrock/blob/765a60450235d810cf941676e4a29f012a9eaaba/bedrock/security/models.py#L29 Based on discussion here mozilla/foundation-security-advisories#76 Signed-off-by: Hritik Vijay <[email protected]>

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

Signed-off-by: Hritik Vijay <[email protected]>

I would really write a test case for this someday too Signed-off-by: Hritik Vijay <[email protected]>

Signed-off-by: Hritik Vijay <[email protected]>

pombredanne

Thanks!
see a few comments inline for you consideration.

Also, can you avoid having a test zip in vulnerabilities/tests/test_data/mozilla.zip ? text is fine instead.
This will simplify the tests and help review too.

pombredanne · 2021-04-03T13:14:40Z

vulnerabilities/helpers.py

+    mdlines = []
+    splitter = mdlines
+
+    for index, line in enumerate(lines.split("\n")):


What about this instead?

import saneyaml # normalize line endings just in case: text = text.replace("\r\n", "\n") front_matter, _, body = text.rpartition("\n---\n") front_matter = saneyaml.load(front_matter)

For instance:

>>> import saneyaml >>> text = """--- ... title: ISTIO-SECURITY-2019-001 ... subtitle: 安全公告 ... description: 错误的权限控制。 ... cve: [CVE-2019-12243] ... publishdate: 2019-05-28 ... keywords: [CVE] ... skip_seealso: true ... aliases: ... - /zh/blog/2019/cve-2019-12243 ... - /zh/news/2019/cve-2019-12243 ... --- ... ... {{< security_bulletin ... cves="CVE-2019-12243" ... cvss="8.9" ... vector="CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:N/E:H/RL:O/RC:C" ... releases="1.1 to 1.1.6" >}} ... ... ## 内容{#context} ... """ >>> text = text.replace("\r\n", "\n") >>> front_matter, _, body = text.rpartition("\n---\n") >>> front_matter = saneyaml.load(front_matter) >>> print(front_matter) {'title': 'ISTIO-SECURITY-2019-001', 'subtitle': '安全公告', 'description': '错误的权限控制。', 'cve': ['CVE-2019-12243'], 'publishda te': '2019-05-28', 'keywords': ['CVE'], 'skip_seealso': True, 'aliases': ['/zh/blog/2019/cve-2019-12243', '/zh/news/2019/cve-2019-122 43']}

We have not been using saneyaml anywhere in the project. If that is preferred, I can add it to requirements and we can start using that but I don't see any specific reason to do so here.
Regarding rpartition. We cannot have that either as a markdown is allowed to have a "\n---"\n.
Consider this

>>> text=""""--- ... announced: February 4, 2014 ... fixed_in: ... - Firefox 27 ... - Firefox ESR 24.3 ... - Thunderbird 24.3 ... - Seamonkey 2.24 ... impact: High ... reporter: Cody Crews ... title: Clone protected content with XBL scopes ... --- ... ... <h3>Description</h3> ... ... Text ... --- ... other text ... """ >>> text = text.replace("\r\n", "\n") >>> front_matter, _, body = text.rpartition("\n---\n") >>> >>> front_matter '"---\nannounced: February 4, 2014\nfixed_in:\n- Firefox 27\n- Firefox ESR 24.3\n- Thunderbird 24.3\n- Seamonkey 2.24\nimpact: High\nreporter: Cody Crews\ntitle: Clone protected content with XBL scopes\n---\n\n<h3>Description</h3>\n\nText' >>> >>> body 'other text\n' >>>

The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208

Further, as it is a helper, we cannot have it mozilla specific. All it should do is to split the front matter and markdown.
I could have something like this

import yaml text="""--- announced: February 4, 2014 fixed_in: - Firefox 27 - Firefox ESR 24.3 - Thunderbird 24.3 - Seamonkey 2.24 impact: High reporter: Cody Crews title: Clone protected content with XBL scopes --- <h3>Description</h3> --- other text """ # normalize line endings just in case: text = text.replace("\r\n", "\n") linezero,_, text = text.partition("---\n") if not linezero: # nothing before first --- front_matter,_, body = text.partition("---") front_matter = yaml.safe_load(front_matter) else: front_matter = "" body = linezero + "---\n" + text print(front_matter) print(body)

which prints

{'announced': 'February 4, 2014', 'fixed_in': ['Firefox 27', 'Firefox ESR 24.3', 'Thunderbird 24.3', 'Seamonkey 2.24'], 'impact': 'High', 'reporter': 'Cody Crews', 'title': 'Clone protected content with XBL scopes'} <h3>Description</h3> --- other text

but doesn't look any better to me.

but doesn't look any better to me.

It feels a bit more readable may be?

The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208

We could very much reuse it as-is as well. This is under an MPL license so we would have to do it in an orderly fashion with proper license tracking and code separation though.
If we do not copy it, we cannot reuse code from it though.

As it is required by multiple importers, let's move this to it's own PR. Opened #443

vulnerabilities/importers/mozilla.py

pombredanne · 2021-04-03T13:53:53Z

vulnerabilities/importers/mozilla.py

+        Advisory(
+            summary=description,
+            vulnerability_id="",
+            impacted_package_urls=[],


How can you get no impacted packages and fixed packages?

Upstream doesn't provide with a list of impacted package. We might consider all packages until this package as impacted but I'm not sure about this. There are many other importers that do this too. Eg: https://github.com/nexB/vulnerablecode/blob/f254b0d4ac54b70c648055a7e8eda16c05dce0f9/vulnerabilities/importers/alpine_linux.py#L194

good point. We need a ticket though, as this may need to be interpreted as "all versions before this version are vulnerable" and it warrant some research and discussions.

Ouch, is this going to be redundant now, just like the suse_backport importer. We now have a improved notion of fixed/resolved/patched package now. Due to #436

@sbs2001 That is scary. I've opened #449 to discuss this further.

I would like to merge this sooner than later. ... See also #449 (comment)

vulnerabilities/importers/mozilla.py

Hritik14 · 2021-04-05T13:17:14Z

@pombredanne Thank you for the through review. Please check unresolved conversations and I'll push the updated code.

pombredanne · 2021-04-08T15:05:14Z

@Hritik14 done :) Thank you for having looked at all these in details!

Provide a helper for uniform cve search in importers. Based on aboutcode-org#393 (comment) Signed-off-by: Hritik Vijay <[email protected]>

sbs2001 · 2021-04-26T04:55:52Z

@pombredanne re zip files: see #442 (comment) .

@Hritik14 IMHO mocking the whole GItDataSource's logic of collecting changes is redundant, since there are thorough tests of that already. There is some mock db things which IMHO is also not required . Instead you could just test the to_advisories and see whether correct Advisory objects are returned. It's fast (no need to setup DB and such) and easy to implement.

Something like https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/tests/test_retiredotnet.py . There are tests which do check what ends up in db, but I think those should be limited and ideally should write small amounts of data.

pombredanne · 2021-06-20T06:45:13Z

@Hritik14 do you mind to rebase?

pombredanne

Let's merge this first and fix it later.
Please enter an issue to track that

We are merging first and will fix issues afterward.

Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 changed the title ~~[WIP] Collect Mozilla~~ Collect Mozilla Mar 21, 2021

sbs2001 requested changes Mar 22, 2021

View reviewed changes

Hritik14 force-pushed the collect_mozilla branch from 3e2c0f8 to a95a0e9 Compare March 29, 2021 05:57

sbs2001 previously requested changes Mar 30, 2021

View reviewed changes

Hritik14 mentioned this pull request Mar 30, 2021

Add unspecified scoring system #415

Merged

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Mar 30, 2021

refactor: replace classmethod w/ func and more

1f2be03

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 force-pushed the collect_mozilla branch from a95a0e9 to 1f2be03 Compare March 30, 2021 19:07

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 1, 2021

refactor: replace classmethod w/ func and more

941eba1

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 force-pushed the collect_mozilla branch from d8fab24 to 0983662 Compare April 1, 2021 10:57

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 1, 2021

refactor: replace classmethod w/ func and more

c04b6c0

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 force-pushed the collect_mozilla branch from 0983662 to b0d39b2 Compare April 1, 2021 19:39

Hritik14 added 12 commits April 2, 2021 14:53

Migrate to GitDataSource from PyGithub

cdbb989

[x] Migrate to GitDataSource [x] Update dependencies [x] More verbose comments Signed-off-by: Hritik Vijay <[email protected]>

Edge case: handle large batch and vul w/o cve

27423e7

Signed-off-by: Hritik Vijay <[email protected]>

Test cases for mozilla importer

c0faf67

Signed-off-by: Hritik Vijay <[email protected]>

Mention mozilla importer

59b420a

Signed-off-by: Hritik Vijay <[email protected]>

Add split_markdown_front_matter()

d1f8315

Signed-off-by: Hritik Vijay <[email protected]>

Extract CVE references from markdown data

df9786d

Signed-off-by: Hritik Vijay <[email protected]>

refactor: replace classmethod w/ func and more

aed84bc

Refactors based on requested review aboutcode-org#393 (review) Signed-off-by: Hritik Vijay <[email protected]>

Remove dangling pdb comment

c60ab41

Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 force-pushed the collect_mozilla branch from b0d39b2 to c60ab41 Compare April 2, 2021 09:23

Hritik14 added 2 commits April 2, 2021 14:59

reorder python imports

58bcdb2

I would really write a test case for this someday too Signed-off-by: Hritik Vijay <[email protected]>

Use generic_textual scoring system

5997b26

Signed-off-by: Hritik Vijay <[email protected]>

pombredanne requested changes Apr 3, 2021

View reviewed changes

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 15, 2021

expose find_all_cve helper

ef94be5

Provide a helper for uniform cve search in importers. Based on aboutcode-org#393 (comment) Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 mentioned this pull request Apr 15, 2021

expose find_all_cve helper #439

Merged

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 19, 2021

expose find_all_cve helper

b5a48a9

Provide a helper for uniform cve search in importers. Based on aboutcode-org#393 (comment) Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 mentioned this pull request May 19, 2021

Add CVE as reference in existing importers #455

Open

pombredanne added this to the v30.0 milestone Feb 2, 2022

Merge branch 'main' into collect_mozilla

e355706

Hritik14 force-pushed the collect_mozilla branch 2 times, most recently from 6d372dc to d192798 Compare February 8, 2022 10:22

pombredanne approved these changes Feb 8, 2022

View reviewed changes

Hritik14 added 2 commits February 8, 2022 23:01

Blackify and ignore mozilla tests

62a2789

Signed-off-by: Hritik Vijay <[email protected]>

Rename DataSource -> Importer

e6d652c

Signed-off-by: Hritik Vijay <[email protected]>

Hritik14 force-pushed the collect_mozilla branch from d192798 to e6d652c Compare February 8, 2022 17:32

Hritik14 merged commit 60a1906 into aboutcode-org:main Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect Mozilla #393

Collect Mozilla #393

Hritik14 commented Mar 20, 2021 •

edited

Loading

sbs2001 left a comment

Hritik14 commented Mar 24, 2021

sbs2001 commented Mar 25, 2021

Hritik14 commented Mar 25, 2021

Hritik14 commented Mar 25, 2021 •

edited

Loading

Hritik14 commented Mar 29, 2021

sbs2001 left a comment

sbs2001 Mar 30, 2021

Hritik14 Mar 30, 2021 •

edited

Loading

sbs2001 Mar 30, 2021

Hritik14 Mar 30, 2021

sbs2001 Mar 30, 2021

Hritik14 Mar 30, 2021

Hritik14 commented Apr 1, 2021

pombredanne left a comment

pombredanne Apr 3, 2021

Hritik14 Apr 5, 2021

pombredanne Apr 8, 2021

Hritik14 Apr 18, 2021

pombredanne Apr 3, 2021

Hritik14 Apr 5, 2021

pombredanne Apr 8, 2021

sbs2001 Apr 22, 2021

Hritik14 Apr 26, 2021

pombredanne Jul 12, 2021

Hritik14 commented Apr 5, 2021

pombredanne commented Apr 8, 2021

sbs2001 commented Apr 26, 2021

pombredanne commented Jun 20, 2021

pombredanne left a comment

Collect Mozilla #393

Collect Mozilla #393

Conversation

Hritik14 commented Mar 20, 2021 • edited Loading

sbs2001 left a comment

Choose a reason for hiding this comment

Hritik14 commented Mar 24, 2021

sbs2001 commented Mar 25, 2021

Hritik14 commented Mar 25, 2021

Hritik14 commented Mar 25, 2021 • edited Loading

Hritik14 commented Mar 29, 2021

sbs2001 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hritik14 Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hritik14 commented Apr 1, 2021

pombredanne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hritik14 commented Apr 5, 2021

pombredanne commented Apr 8, 2021

sbs2001 commented Apr 26, 2021

pombredanne commented Jun 20, 2021

pombredanne left a comment

Choose a reason for hiding this comment

Hritik14 commented Mar 20, 2021 •

edited

Loading

Hritik14 commented Mar 25, 2021 •

edited

Loading

Hritik14 Mar 30, 2021 •

edited

Loading