Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support to query Semgrep API to ingest SCA vulns #1224

Merged
merged 15 commits into from
Aug 3, 2023
Merged

Conversation

heryxpc
Copy link
Contributor

@heryxpc heryxpc commented Jul 19, 2023

Adds a new Schema and Intel Job to query Semgrep Enterprise API and ingest Semgrep Supply Chain (SSC) findings.
The schema connects a Semgrep Deployment with an id specific to a customer in Semgrep Enterprise to an SCA finding and location as a sub resource relationships. It also connects to a Github repository to match findings against where they were found. Each finding can have a location with the specific lines of code where the vulnerable dependency is being used.
SemgrepCartographyfinal

@heryxpc heryxpc marked this pull request as ready for review July 24, 2023 16:09
cartography/cli.py Outdated Show resolved Hide resolved
cartography/intel/semgrep/__init__.py Outdated Show resolved Hide resolved
cartography/intel/semgrep/findings.py Outdated Show resolved Hide resolved
}
response = requests.get(deployment_url, headers=headers)
response.raise_for_status()
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove all of the try/except branches. If anything fails, they will throw their own exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I at least leave the RequestException case? I would like to see the 400's and 500's at the logs early rather than wait until it breaks all the sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only add additional logs if we're able to add context that the exception object does not have. Otherwise adding more logs just makes the code and log trace messier so that it takes longer to debug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like alex said, raise_for_status would immediately break the sync on a 400 or 500.

cartography/models/semgrep/findings.py Outdated Show resolved Hide resolved
sca_vuln["closestSafeDependency"] = dep_fix
ref_urls = vuln["advisory"].get("references", {}).get("urls", [])
if ref_urls:
sca_vuln["ref_urls"] = ",".join(ref_urls)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocker: this node property's value can also be an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the querybuilder supports passing arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The generated query will look like this: https://github.com/lyft/cartography/blob/f657a0f73c0cc449a27d8a490f2b86d66d31951d/cartography/graph/querybuilder.py#L41

and the Neo4j python driver is smart enough to set the Python list type as a list type on the node. See https://neo4j.com/docs/python-manual/current/data-types/#_core_types for this mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, first time that I will see list as node properties, will give it a try!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load works, but the check_nodes didn't, as the list returned is not hashable here:
https://github.com/lyft/cartography/blob/c78da1a1b70027d2ba3e46ce6225fe27e15811ee/tests/integration/util.py#L25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what this conversation thread was originally about since GitHub's commenting seems to have moved it to a different line.

In general though let's try to use lists in Neo4j nodes as little as possible since it's easier to test for relationship connections. Using lists does make sense though for things where it's too heavyweight to create a whole new node type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still rather this as a list.

In the mean time, since it doesn't work with check_nodes, do a workaround:

  • make a new check_nodes_as_list, that returns the list of nodes, rather than a set
  • write the query yourself to check the nodes.

cartography/intel/semgrep/findings.py Outdated Show resolved Hide resolved
cartography/intel/semgrep/findings.py Show resolved Hide resolved
cartography/intel/semgrep/findings.py Outdated Show resolved Hide resolved
tests/unit/cartography/intel/semgrep/test_findings.py Outdated Show resolved Hide resolved
Copy link
Contributor

@achantavy achantavy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving suggestions on how to button this up. Looks good overall!

Can you please also add schema docs, config docs, and the Readme so that users know how to use it, how to set it up, and that we have Semgrep coverage?

tests/unit/cartography/intel/semgrep/test_findings.py Outdated Show resolved Hide resolved
tests/unit/cartography/intel/semgrep/test_findings.py Outdated Show resolved Hide resolved
tests/unit/cartography/intel/semgrep/test_findings.py Outdated Show resolved Hide resolved
tests/unit/cartography/intel/semgrep/test_findings.py Outdated Show resolved Hide resolved
cartography/intel/semgrep/__init__.py Outdated Show resolved Hide resolved
cartography/intel/semgrep/findings.py Outdated Show resolved Hide resolved
cartography/intel/semgrep/findings.py Show resolved Hide resolved
cartography/intel/semgrep/findings.py Show resolved Hide resolved
sca_vuln["closestSafeDependency"] = dep_fix
ref_urls = vuln["advisory"].get("references", {}).get("urls", [])
if ref_urls:
sca_vuln["ref_urls"] = ",".join(ref_urls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The generated query will look like this: https://github.com/lyft/cartography/blob/f657a0f73c0cc449a27d8a490f2b86d66d31951d/cartography/graph/querybuilder.py#L41

and the Neo4j python driver is smart enough to set the Python list type as a list type on the node. See https://neo4j.com/docs/python-manual/current/data-types/#_core_types for this mapping.

common_job_parameters: Dict[str, Any],
) -> None:
logger.info("Running Semgrep SCA findings sync job.")
semgrep_deployment = get_deployment(semgrep_app_token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit and non-blocker: consider splitting to sync_sca_vulns and sync_sca_usages to make it cleaner.

@heryxpc
Copy link
Contributor Author

heryxpc commented Aug 1, 2023

Leaving suggestions on how to button this up. Looks good overall!

Can you please also add schema docs, config docs, and the Readme so that users know how to use it, how to set it up, and that we have Semgrep coverage?

PR comments addressed and added docs.
I wasn't sure about the README but added a reference and a page that links to the docs once they are deployed.

@heryxpc heryxpc requested a review from achantavy August 1, 2023 20:24
achantavy
achantavy previously approved these changes Aug 2, 2023
cartography/intel/semgrep/findings.py Outdated Show resolved Hide resolved
sca_vuln["closestSafeDependency"] = dep_fix
ref_urls = vuln["advisory"].get("references", {}).get("urls", [])
if ref_urls:
sca_vuln["ref_urls"] = ",".join(ref_urls)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still rather this as a list.

In the mean time, since it doesn't work with check_nodes, do a workaround:

  • make a new check_nodes_as_list, that returns the list of nodes, rather than a set
  • write the query yourself to check the nodes.

cartography/models/semgrep/deployment.py Outdated Show resolved Hide resolved
cartography/models/semgrep/findings.py Outdated Show resolved Hide resolved
cartography/models/semgrep/findings.py Outdated Show resolved Hide resolved
docs/root/modules/semgrep/schema.md Outdated Show resolved Hide resolved
@heryxpc heryxpc merged commit b0a58a5 into master Aug 3, 2023
4 checks passed
@heryxpc heryxpc deleted the semgrep-intel branch August 3, 2023 19:08
chandanchowdhury pushed a commit to juju4/cartography that referenced this pull request Jun 26, 2024
…cf#1224)

Adds a new Schema and Intel Job to query Semgrep Enterprise API and
ingest Semgrep Supply Chain (SSC) findings.
The schema connects a Semgrep Deployment with an id specific to a
customer in Semgrep Enterprise to an SCA finding and location as a sub
resource relationships. It also connects to a Github repository to match
findings against where they were found. Each finding can have a location
with the specific lines of code where the vulnerable dependency is being
used.

![SemgrepCartographyfinal](https://github.com/lyft/cartography/assets/9236431/9a99ecdd-b40f-430e-bff5-fe950f4c713e)

---------

Co-authored-by: Alex Chantavy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants