Adds support to query Semgrep API to ingest SCA vulns #1224

heryxpc · 2023-07-19T20:56:30Z

Adds a new Schema and Intel Job to query Semgrep Enterprise API and ingest Semgrep Supply Chain (SSC) findings.
The schema connects a Semgrep Deployment with an id specific to a customer in Semgrep Enterprise to an SCA finding and location as a sub resource relationships. It also connects to a Github repository to match findings against where they were found. Each finding can have a location with the specific lines of code where the vulnerable dependency is being used.

cartography/cli.py

cartography/intel/semgrep/__init__.py

cartography/intel/semgrep/findings.py

ramonpetgrave64 · 2023-07-26T22:22:07Z

cartography/intel/semgrep/findings.py

+    }
+    response = requests.get(deployment_url, headers=headers)
+    response.raise_for_status()
+    try:


I would remove all of the try/except branches. If anything fails, they will throw their own exceptions.

Can I at least leave the RequestException case? I would like to see the 400's and 500's at the logs early rather than wait until it breaks all the sync.

I would only add additional logs if we're able to add context that the exception object does not have. Otherwise adding more logs just makes the code and log trace messier so that it takes longer to debug.

Like alex said, raise_for_status would immediately break the sync on a 400 or 500.

cartography/models/semgrep/findings.py

ramonpetgrave64 · 2023-07-26T22:34:26Z

cartography/intel/semgrep/findings.py

+        sca_vuln["closestSafeDependency"] = dep_fix
+        ref_urls = vuln["advisory"].get("references", {}).get("urls", [])
+        if ref_urls:
+            sca_vuln["ref_urls"] = ",".join(ref_urls)


non-blocker: this node property's value can also be an array.

Does the querybuilder supports passing arrays?

Yes. The generated query will look like this: https://github.com/lyft/cartography/blob/f657a0f73c0cc449a27d8a490f2b86d66d31951d/cartography/graph/querybuilder.py#L41

and the Neo4j python driver is smart enough to set the Python list type as a list type on the node. See https://neo4j.com/docs/python-manual/current/data-types/#_core_types for this mapping.

Wow, first time that I will see list as node properties, will give it a try!

The load works, but the check_nodes didn't, as the list returned is not hashable here:
https://github.com/lyft/cartography/blob/c78da1a1b70027d2ba3e46ce6225fe27e15811ee/tests/integration/util.py#L25

I forget what this conversation thread was originally about since GitHub's commenting seems to have moved it to a different line.

In general though let's try to use lists in Neo4j nodes as little as possible since it's easier to test for relationship connections. Using lists does make sense though for things where it's too heavyweight to create a whole new node type.

I'd still rather this as a list.

In the mean time, since it doesn't work with check_nodes, do a workaround:

make a new check_nodes_as_list, that returns the list of nodes, rather than a set

write the query yourself to check the nodes.

cartography/intel/semgrep/findings.py

tests/unit/cartography/intel/semgrep/test_findings.py

…dent clean up jobs

achantavy

Leaving suggestions on how to button this up. Looks good overall!

Can you please also add schema docs, config docs, and the Readme so that users know how to use it, how to set it up, and that we have Semgrep coverage?

tests/unit/cartography/intel/semgrep/test_findings.py

cartography/intel/semgrep/__init__.py

cartography/intel/semgrep/findings.py

achantavy · 2023-07-28T19:21:09Z

cartography/intel/semgrep/findings.py

+        sca_vuln["closestSafeDependency"] = dep_fix
+        ref_urls = vuln["advisory"].get("references", {}).get("urls", [])
+        if ref_urls:
+            sca_vuln["ref_urls"] = ",".join(ref_urls)


Yes. The generated query will look like this: https://github.com/lyft/cartography/blob/f657a0f73c0cc449a27d8a490f2b86d66d31951d/cartography/graph/querybuilder.py#L41

and the Neo4j python driver is smart enough to set the Python list type as a list type on the node. See https://neo4j.com/docs/python-manual/current/data-types/#_core_types for this mapping.

achantavy · 2023-07-28T20:20:07Z

cartography/intel/semgrep/findings.py

+    common_job_parameters: Dict[str, Any],
+) -> None:
+    logger.info("Running Semgrep SCA findings sync job.")
+    semgrep_deployment = get_deployment(semgrep_app_token)


Nit and non-blocker: consider splitting to sync_sca_vulns and sync_sca_usages to make it cleaner.

heryxpc · 2023-08-01T20:23:10Z

Leaving suggestions on how to button this up. Looks good overall!

Can you please also add schema docs, config docs, and the Readme so that users know how to use it, how to set it up, and that we have Semgrep coverage?

PR comments addressed and added docs.
I wasn't sure about the README but added a reference and a page that links to the docs once they are deployed.

docs/root/modules/semgrep/config.md

Co-authored-by: Alex Chantavy <[email protected]>

cartography/intel/semgrep/findings.py

ramonpetgrave64 · 2023-08-02T20:22:09Z

cartography/intel/semgrep/findings.py

+        sca_vuln["closestSafeDependency"] = dep_fix
+        ref_urls = vuln["advisory"].get("references", {}).get("urls", [])
+        if ref_urls:
+            sca_vuln["ref_urls"] = ",".join(ref_urls)


I'd still rather this as a list.

In the mean time, since it doesn't work with check_nodes, do a workaround:

make a new check_nodes_as_list, that returns the list of nodes, rather than a set

write the query yourself to check the nodes.

cartography/models/semgrep/deployment.py

cartography/models/semgrep/findings.py

docs/root/modules/semgrep/schema.md

…cf#1224) Adds a new Schema and Intel Job to query Semgrep Enterprise API and ingest Semgrep Supply Chain (SSC) findings. The schema connects a Semgrep Deployment with an id specific to a customer in Semgrep Enterprise to an SCA finding and location as a sub resource relationships. It also connects to a Github repository to match findings against where they were found. Each finding can have a location with the specific lines of code where the vulnerable dependency is being used. ![SemgrepCartographyfinal](https://github.com/lyft/cartography/assets/9236431/9a99ecdd-b40f-430e-bff5-fe950f4c713e) --------- Co-authored-by: Alex Chantavy <[email protected]>

heryxpc added 4 commits July 19, 2023 14:54

Adds support to query Semgrep API to ingest SCA vulns

691732a

Add SemgrepSCALocation, unit and integration tests

905453b

Remove optional attributes not returned by Semgrep

0c1ac60

Merge branch 'master' into semgrep-intel

6402a8a

heryxpc marked this pull request as ready for review July 24, 2023 16:09

Reduce line size

c82b200

heryxpc requested review from ramonpetgrave64 and achantavy July 24, 2023 17:58

ramonpetgrave64 requested changes Jul 26, 2023

View reviewed changes

Address PR comments

338753a

heryxpc requested a review from ramonpetgrave64 July 28, 2023 17:31

heryxpc added 2 commits July 28, 2023 11:35

Update cleanup jobs

f479a77

Add relationship between SCA Location and Deployment to allow indepen…

86b6b7b

…dent clean up jobs

achantavy requested changes Jul 28, 2023

View reviewed changes

Address more PR comments, include transitivity, add docs

54dc8f0

Linter errors

3923ef2

heryxpc requested a review from achantavy August 1, 2023 20:24

achantavy reviewed Aug 1, 2023

View reviewed changes

docs/root/modules/semgrep/config.md Outdated Show resolved Hide resolved

heryxpc and others added 2 commits August 1, 2023 16:03

Update docs/root/modules/semgrep/config.md

83ce653

Co-authored-by: Alex Chantavy <[email protected]>

Link Semgrep docs to rendered html

e21ea69

achantavy previously approved these changes Aug 2, 2023

View reviewed changes

Merge branch 'master' into semgrep-intel

5cbfac7

ramonpetgrave64 requested changes Aug 2, 2023

View reviewed changes

Address more PR comments

05f7b1f

heryxpc dismissed achantavy’s stale review via 05f7b1f August 2, 2023 22:14

heryxpc requested a review from ramonpetgrave64 August 2, 2023 22:31

Make ref_urls a list

4de9008

achantavy approved these changes Aug 3, 2023

View reviewed changes

ramonpetgrave64 approved these changes Aug 3, 2023

View reviewed changes

heryxpc merged commit b0a58a5 into master Aug 3, 2023
4 checks passed

heryxpc deleted the semgrep-intel branch August 3, 2023 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds support to query Semgrep API to ingest SCA vulns #1224

Adds support to query Semgrep API to ingest SCA vulns #1224

heryxpc commented Jul 19, 2023 •

edited

Loading

ramonpetgrave64 Jul 26, 2023

heryxpc Jul 27, 2023

achantavy Jul 28, 2023

ramonpetgrave64 Jul 31, 2023

ramonpetgrave64 Jul 26, 2023

heryxpc Jul 27, 2023

achantavy Jul 28, 2023

heryxpc Jul 28, 2023

heryxpc Aug 1, 2023

achantavy Aug 1, 2023

ramonpetgrave64 Aug 2, 2023

achantavy left a comment

achantavy Jul 28, 2023

achantavy Jul 28, 2023

heryxpc commented Aug 1, 2023

ramonpetgrave64 Aug 2, 2023

Adds support to query Semgrep API to ingest SCA vulns #1224

Adds support to query Semgrep API to ingest SCA vulns #1224

Conversation

heryxpc commented Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achantavy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heryxpc commented Aug 1, 2023

Choose a reason for hiding this comment

heryxpc commented Jul 19, 2023 •

edited

Loading