-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support to query Semgrep API to ingest SCA vulns #1224
Conversation
} | ||
response = requests.get(deployment_url, headers=headers) | ||
response.raise_for_status() | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove all of the try/except branches. If anything fails, they will throw their own exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I at least leave the RequestException
case? I would like to see the 400's and 500's at the logs early rather than wait until it breaks all the sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only add additional logs if we're able to add context that the exception object does not have. Otherwise adding more logs just makes the code and log trace messier so that it takes longer to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like alex said, raise_for_status would immediately break the sync on a 400 or 500.
sca_vuln["closestSafeDependency"] = dep_fix | ||
ref_urls = vuln["advisory"].get("references", {}).get("urls", []) | ||
if ref_urls: | ||
sca_vuln["ref_urls"] = ",".join(ref_urls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocker: this node property's value can also be an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the querybuilder supports passing arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The generated query will look like this: https://github.com/lyft/cartography/blob/f657a0f73c0cc449a27d8a490f2b86d66d31951d/cartography/graph/querybuilder.py#L41
and the Neo4j python driver is smart enough to set the Python list type as a list type on the node. See https://neo4j.com/docs/python-manual/current/data-types/#_core_types for this mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, first time that I will see list as node properties, will give it a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load
works, but the check_nodes
didn't, as the list
returned is not hashable here:
https://github.com/lyft/cartography/blob/c78da1a1b70027d2ba3e46ce6225fe27e15811ee/tests/integration/util.py#L25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget what this conversation thread was originally about since GitHub's commenting seems to have moved it to a different line.
In general though let's try to use lists in Neo4j nodes as little as possible since it's easier to test for relationship connections. Using lists does make sense though for things where it's too heavyweight to create a whole new node type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still rather this as a list.
In the mean time, since it doesn't work with check_nodes
, do a workaround:
- make a new
check_nodes_as_list
, that returns the list of nodes, rather than a set - write the query yourself to check the nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving suggestions on how to button this up. Looks good overall!
Can you please also add schema docs, config docs, and the Readme so that users know how to use it, how to set it up, and that we have Semgrep coverage?
sca_vuln["closestSafeDependency"] = dep_fix | ||
ref_urls = vuln["advisory"].get("references", {}).get("urls", []) | ||
if ref_urls: | ||
sca_vuln["ref_urls"] = ",".join(ref_urls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The generated query will look like this: https://github.com/lyft/cartography/blob/f657a0f73c0cc449a27d8a490f2b86d66d31951d/cartography/graph/querybuilder.py#L41
and the Neo4j python driver is smart enough to set the Python list type as a list type on the node. See https://neo4j.com/docs/python-manual/current/data-types/#_core_types for this mapping.
common_job_parameters: Dict[str, Any], | ||
) -> None: | ||
logger.info("Running Semgrep SCA findings sync job.") | ||
semgrep_deployment = get_deployment(semgrep_app_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit and non-blocker: consider splitting to sync_sca_vulns and sync_sca_usages to make it cleaner.
PR comments addressed and added docs. |
Co-authored-by: Alex Chantavy <[email protected]>
sca_vuln["closestSafeDependency"] = dep_fix | ||
ref_urls = vuln["advisory"].get("references", {}).get("urls", []) | ||
if ref_urls: | ||
sca_vuln["ref_urls"] = ",".join(ref_urls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still rather this as a list.
In the mean time, since it doesn't work with check_nodes
, do a workaround:
- make a new
check_nodes_as_list
, that returns the list of nodes, rather than a set - write the query yourself to check the nodes.
…cf#1224) Adds a new Schema and Intel Job to query Semgrep Enterprise API and ingest Semgrep Supply Chain (SSC) findings. The schema connects a Semgrep Deployment with an id specific to a customer in Semgrep Enterprise to an SCA finding and location as a sub resource relationships. It also connects to a Github repository to match findings against where they were found. Each finding can have a location with the specific lines of code where the vulnerable dependency is being used. ![SemgrepCartographyfinal](https://github.com/lyft/cartography/assets/9236431/9a99ecdd-b40f-430e-bff5-fe950f4c713e) --------- Co-authored-by: Alex Chantavy <[email protected]>
Adds a new Schema and Intel Job to query Semgrep Enterprise API and ingest Semgrep Supply Chain (SSC) findings.
The schema connects a Semgrep Deployment with an id specific to a customer in Semgrep Enterprise to an SCA finding and location as a sub resource relationships. It also connects to a Github repository to match findings against where they were found. Each finding can have a location with the specific lines of code where the vulnerable dependency is being used.