Ingest Go dependencies using Semgrep API #1368

hanzo · 2024-10-15T21:35:37Z

Summary

Motivation: The library dependency graph is already populated with PythonLibrary nodes via Cartography's Github module, but no other languages are supported. This PR adds GoLibrary nodes to the library dependency graph to bring support for Go up to parity with Python. I concur with the recommendation from @heryxpc that rather than writing code to manually parse go.mod files, we should instead use the dependency data returned by the Semgrep API.

Cartography's Semgrep module is already able to import supply chain vulnerability data from the Findings endpoint of the Semgrep API. Semgrep also provides a List Dependencies endpoint that returns a list of every known dependency for a given ecosystem (e.g. specifying the “gomod” ecosystem returns all dependencies found in go.mod files). The response contains useful information including the transitivity of the dependency and a link to where it’s defined in source code.

The dependency nodes imported from the Semgrep API will be labelled GoLibrary::SemgrepDependency::Dependency and will match the properties of existing PythonLibrary::Dependency nodes as closely as possible. This PR only imports Go dependencies from Semgrep, but I've structured the code to make it easy to import additional languages from Semgrep in the future.

Before these changes, a project with both Python and Go dependencies will only have PythonLibrary nodes in the dependency graph:

After these changes, for the same project the graph contains both PythonLibrary and GoLibrary nodes:

Logs from semgrep module before these changes

INFO:cartography.sync:Starting sync with update tag '1730497895'
INFO:cartography.sync:Starting sync stage 'create-indexes'
INFO:cartography.intel.create_indexes:Creating indexes for cartography node types.
INFO:cartography.sync:Finishing sync stage 'create-indexes'
INFO:cartography.sync:Starting sync stage 'semgrep'
INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job.
INFO:cartography.intel.semgrep.findings:Loading Semgrep deployment info {'id': ...} into the graph...
INFO:cartography.intel.semgrep.findings:Retrieving Semgrep SCA vulns for deployment 'X'.
INFO:cartography.intel.semgrep.findings:Processed page 0 of Semgrep SCA vulnerabilities.
...
INFO:cartography.intel.semgrep.findings:Processed page X of Semgrep SCA vulnerabilities.
INFO:cartography.intel.semgrep.findings:Retrieved X Semgrep SCA vulns in X pages.
INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA vulns info into the graph.
INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA usages info into the graph.
INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #1
...
INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #X
INFO:cartography.graph.job:Finished job semgrep_sca_risk_analysis
INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings cleanup job.
INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #1
...
INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #X
INFO:cartography.graph.job:Finished job SemgrepSCAFinding
INFO:cartography.intel.semgrep.findings:Running Semgrep SCA Locations cleanup job.
INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #1
...
INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #X
INFO:cartography.graph.job:Finished job SemgrepSCALocation
INFO:cartography.sync:Finishing sync stage 'semgrep'
INFO:cartography.sync:Finishing sync with update tag '1730497895'

Logs from semgrep module after these changes

INFO:cartography.sync:Starting sync with update tag '1730505324'
INFO:cartography.sync:Starting sync stage 'create-indexes'
INFO:cartography.intel.create_indexes:Creating indexes for cartography node types.
INFO:cartography.sync:Finishing sync stage 'create-indexes'
INFO:cartography.sync:Starting sync stage 'semgrep'
INFO:cartography.intel.semgrep.deployment:Loading Semgrep deployment info {'id': ...} into the graph...
INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job.
INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep dependencies for deployment 'X'.
INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep dependencies.
...
INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep dependencies.
INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep dependencies in X pages.
INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph.
INFO:cartography.intel.semgrep.dependencies:Running Semgrep Go Library cleanup job.
INFO:cartography.graph.statement:Completed GoLibrary statement #1
...
INFO:cartography.graph.statement:Completed GoLibrary statement #X
INFO:cartography.graph.job:Finished job GoLibrary
INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job.
INFO:cartography.intel.semgrep.findings:Retrieving Semgrep SCA vulns for deployment 'lyft'.
INFO:cartography.intel.semgrep.findings:Processed page 0 of Semgrep SCA vulnerabilities.
...
INFO:cartography.intel.semgrep.findings:Processed page X of Semgrep SCA vulnerabilities.
INFO:cartography.intel.semgrep.findings:Retrieved X Semgrep SCA vulns in X pages.
INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA vulns info into the graph.
INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA usages info into the graph.
INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #1
...
INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #X
INFO:cartography.graph.job:Finished job semgrep_sca_risk_analysis
INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings cleanup job.
INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #1
...
INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #X
INFO:cartography.graph.job:Finished job SemgrepSCAFinding
INFO:cartography.intel.semgrep.findings:Running Semgrep SCA Locations cleanup job.
INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #1
...
INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #X
INFO:cartography.graph.job:Finished job SemgrepSCALocation
INFO:cartography.sync:Finishing sync stage 'semgrep'
INFO:cartography.sync:Finishing sync with update tag '1730497895'

Checklist

Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:

Update/add unit or integration tests.
Include a screenshot showing what the graph looked like before and after your changes.
Include console log trace showing what happened before and after your changes.

If you are changing a node or relationship:

Update the schema and readme.

If you are implementing a new intel module:

Use the NodeSchema data model.

hanzo · 2024-10-22T19:01:29Z

cartography/intel/semgrep/findings.py

@@ -282,6 +286,12 @@ def sync(
    load_semgrep_sca_vulns(neo4j_sesion, vulns, deployment_id, update_tag)
    load_semgrep_sca_usages(neo4j_sesion, usages, deployment_id, update_tag)
    run_scoped_analysis_job('semgrep_sca_risk_analysis.json', neo4j_sesion, common_job_parameters)
+
+    # fetch and load dependencies for the Go ecosystem


Now that this sync() function is being used for more than just findings, should I move it to a separate file, e.g. common.py? Or I could create a separate sync() function in dependencies.py, but it would need to duplicate a lot of this function's code.

Good question. I would say an approach like the github ingestion is good, multiple resource syncs called from the start_semgrep_ingestion.

The only code you need to repeat is the cleanup and merge_module_sync_metadata process.

The other thing is that the findings API requires the deployment ID, currently retrieved here:

cartography/cartography/intel/semgrep/findings.py

Lines 275 to 278 in 810e391

semgrep_deployment = get_deployment(semgrep_app_token)

deployment_id = semgrep_deployment["id"]

deployment_slug = semgrep_deployment["slug"]

load_semgrep_deployment(neo4j_sesion, semgrep_deployment, update_tag)

I could update start_semgrep_ingestion to fetch the deployment ID and pass it as an argument to the other sync functions?

@heryxpc I've done some refactoring, what do you think? I don't love the way it's set up now but couldn't figure out a cleaner approach, open to suggestions

I've taken another pass at organizing the code, this way is much cleaner but it feels a little brittle that sync_deployment sets values in common_job_parameters that are required by sync_findings and sync_dependencies, so it would break if you called the sync functions in a different order

cartography/intel/semgrep/__init__.py

hanzo · 2024-10-29T18:40:02Z

cartography/intel/semgrep/deployment.py

I named this deployment.py instead of deployments.py to match the existing file models/deployment.py.

The contents of this file have been moved here from intel/semgrep/findings.py without changes

cartography/intel/semgrep/deployment.py

cartography/intel/semgrep/__init__.py

Signed-off-by: Hans Wernetti <[email protected]>

### Summary > Describe your changes. Now that cartography has been donated to the CNCF, time to update the docs Signed-off-by: Alex Chantavy <[email protected]> Signed-off-by: Hans Wernetti <[email protected]>

First release after CNCF donation, lets see if the CI tooling works Signed-off-by: Hans Wernetti <[email protected]>

Signed-off-by: Hans Wernetti <[email protected]>

cartography/intel/semgrep/deployment.py

Signed-off-by: Hans Wernetti <[email protected]>

hanzo · 2024-11-01T21:00:22Z

cartography/intel/semgrep/dependencies.py

+
+        # We could call a different endpoint to get all repo IDs and store a mapping of repo ID to URL,
+        # but it's much simpler to just extract the URL from the definedAt field.
+        repo_url = raw_dep["definedAt"]["url"].split("/blob/", 1)[0]


I considered what might cause this string split to give the wrong result, but I think it's very unlikely. Even if a repo stored its go.mod file inside a directory named /blob/ (which would be really strange), the url returned from semgrep would be something like https://github.com/org/repo/blob/sha/blob/go.mod#L112, so repo_url would still be set to https://github.com/org/repo as expected.

Signed-off-by: Hans Wernetti <[email protected]>

hanzo · 2024-11-04T18:51:25Z

cartography/intel/semgrep/findings.py

+
+    deployment_id = common_job_parameters.get("DEPLOYMENT_ID")
+    deployment_slug = common_job_parameters.get("DEPLOYMENT_SLUG")
+    if not deployment_id or not deployment_slug:


I don't like this mechanism for getting the required parameters, would love to hear suggestions for improvement

achantavy

Left nits but nothing blocking. Thanks for doing this and for including the screenshots and tests!!

achantavy · 2024-10-31T01:49:38Z

cartography/intel/semgrep/dependencies.py

+            logger.warning(f"Failed to retrieve Semgrep dependencies for page {page}. Retrying...")
+            retries += 1
+            if retries >= _MAX_RETRIES:
+                raise e


Just wondering, why not just raise?

Only because this is copied from

cartography/cartography/intel/semgrep/findings.py

Line 88 in 3e7941b

raise e

I'll update both lines to raise

cartography/intel/semgrep/dependencies.py

achantavy · 2024-11-04T21:50:34Z

cartography/intel/semgrep/dependencies.py

+    deployment_id: str,
+    update_tag: int,
+) -> None:
+    logger.info(f"Loading {len(dependencies)} Semgrep dependencies into the graph.")


[non-block] i'm not a huge fan of metaprogramming here but I won't block on this.

That aside, if we decide to keep this bit of metaprogramming, it'd be good to log the label of the dependency_schema object so that the log message shows what asset is getting written to the graph.

This is copied from

cartography/cartography/intel/semgrep/findings.py

Line 224 in 3e7941b

logger.info(f"Loading {len(vulns)} Semgrep SCA vulns info into the graph.")

I'll update all of these log lines to use the label of the schema object

Updated logs:

INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep dependencies in Y pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Go Library cleanup job.

cartography/models/semgrep/dependencies.py

achantavy · 2024-11-04T21:55:54Z

cartography/models/semgrep/dependencies.py

+
+@dataclass(frozen=True)
+# (:SemgrepDependency)<-[:REQUIRES]-(:GitHubRepository)
+class SemgrepDependencyToGithubRepoRel(CartographyRelSchema):


[Non-block]
since this rel is specifically for a go-library to GitHub repo, I'd prefer to add some copypasta to make it separate. This is a style choice and I won't block here though since in future we can decouple that out if we decide that another semgrep dependency needs to have a different relationship definition than a go library.

From what I've seen, all semgrep dependencies share the same relationship with the github repo where they were detected by semgrep. I'd be inclined to leave it as is for now and we can decouple it later if we discover that a different relationship is needed

Signed-off-by: Hans Wernetti <[email protected]>

olivia-hong

will defer approval to the cartography experts but overall lgtm!

### Summary Small followup to #1368, noticed that the schema doc is not being rendered correctly: ![image](https://github.com/user-attachments/assets/e3f34e94-a7c0-4b39-b243-460fe39acb57) I'm not sure how to test that it will render correctly after this change, but this format matches the existing format used by the github schema: https://github.com/cartography-cncf/cartography/blob/f11d7b2ff87331ed736a5de2547cde812face1a0/docs/root/modules/github/schema.md?plain=1#L210-L218 --------- Signed-off-by: Hans Wernetti <[email protected]>

### Summary This PR adds support to ingest dependencies from Semgrep for the NPM ecosystem, as well as introducing a CLI flag allowing users to specify which ecosystems to ingest. ### Related issues or links #1368 added support for ingesting dependencies from Semgrep (only for the `gomod` ecosystem) ### Demo Before these changes, a project with both Go and NPM dependencies will only have GoLibrary nodes in the dependency graph: <img width="1036" alt="image" src="https://github.com/user-attachments/assets/31d97626-be70-4c80-9a5b-71c26056a53b"> After these changes, for the same project the graph contains both GoLibrary and NpmLibrary nodes: <img width="1039" alt="image" src="https://github.com/user-attachments/assets/d09cc265-ccd6-463e-bd01-2b3e7c6d1778"> <details> <summary>Logs from semgrep module before these changes</summary> ``` INFO:cartography.sync:Starting sync stage 'semgrep' INFO:cartography.intel.semgrep.deployment:Loading Semgrep deployment info {'id': ...} into the graph... INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job. INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep dependencies. ... INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Go Library cleanup job. INFO:cartography.graph.statement:Completed GoLibrary statement #1 ... INFO:cartography.graph.statement:Completed GoLibrary statement #X INFO:cartography.graph.job:Finished job GoLibrary INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. ... INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1730497895' ``` </details> <details> <summary>Logs from semgrep module after these changes</summary> ``` INFO:cartography.intel.semgrep.deployment:Loading SemgrepDeployment {'id': ...} into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job. INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep gomod dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep gomod dependencies. INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep gomod dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep gomod dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Dependencies cleanup job for GoLibrary. INFO:cartography.graph.statement:Completed GoLibrary statement #1 INFO:cartography.graph.statement:Completed GoLibrary statement #2 INFO:cartography.graph.statement:Completed GoLibrary statement #3 INFO:cartography.graph.job:Finished job GoLibrary INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep npm dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep npm dependencies. ... INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep npm dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep npm dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X NpmLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Dependencies cleanup job for NpmLibrary. INFO:cartography.graph.statement:Completed NpmLibrary statement #1 INFO:cartography.graph.statement:Completed NpmLibrary statement #2 INFO:cartography.graph.statement:Completed NpmLibrary statement #3 INFO:cartography.graph.job:Finished job NpmLibrary INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. ... INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1731969699' ``` </details> ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [x] Include a screenshot showing what the graph looked like before and after your changes. - [x] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [x] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [x] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). --------- Signed-off-by: Hans Wernetti <[email protected]>

@heryxpc

### Summary Motivation: The library dependency graph is already populated with `PythonLibrary` nodes via Cartography's Github module, but no other languages are supported. This PR adds `GoLibrary` nodes to the library dependency graph to bring support for Go up to parity with Python. I concur with the recommendation from @heryxpc that rather than writing code to manually parse go.mod files, we should instead use the dependency data returned by the Semgrep API. Cartography's Semgrep module is already able to import supply chain vulnerability data from the [Findings](https://semgrep.dev/api/v1/docs/#tag/Finding/operation/semgrep_app.core_exp.findings.handlers.issue.openapi_list_recent_issues) endpoint of the Semgrep API. Semgrep also provides a [List Dependencies](https://semgrep.dev/api/v1/docs/#tag/SupplyChainService/operation/semgrep_app.products.sca.handlers.dependency.list_dependencies_conexxion) endpoint that returns a list of every known dependency for a given ecosystem (e.g. specifying the “gomod” ecosystem returns all dependencies found in go.mod files). The response contains useful information including the transitivity of the dependency and a link to where it’s defined in source code. The dependency nodes imported from the Semgrep API will be labelled `GoLibrary::SemgrepDependency::Dependency` and will match the properties of existing `PythonLibrary::Dependency` nodes as closely as possible. This PR only imports Go dependencies from Semgrep, but I've structured the code to make it easy to import additional languages from Semgrep in the future. Before these changes, a project with both Python and Go dependencies will only have PythonLibrary nodes in the dependency graph: <img width="1019" alt="image" src="https://github.com/user-attachments/assets/9e291012-103e-4dae-a2bb-2da5205421b7"> After these changes, for the same project the graph contains both PythonLibrary and GoLibrary nodes: <img width="1015" alt="image" src="https://github.com/user-attachments/assets/f945e489-6a3e-4edf-85d4-424bacd763b2"> <details> <summary>Logs from semgrep module before these changes</summary> ``` INFO:cartography.sync:Starting sync with update tag '1730497895' INFO:cartography.sync:Starting sync stage 'create-indexes' INFO:cartography.intel.create_indexes:Creating indexes for cartography node types. INFO:cartography.sync:Finishing sync stage 'create-indexes' INFO:cartography.sync:Starting sync stage 'semgrep' INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. INFO:cartography.intel.semgrep.findings:Loading Semgrep deployment info {'id': ...} into the graph... INFO:cartography.intel.semgrep.findings:Retrieving Semgrep SCA vulns for deployment 'X'. INFO:cartography.intel.semgrep.findings:Processed page 0 of Semgrep SCA vulnerabilities. ... INFO:cartography.intel.semgrep.findings:Processed page X of Semgrep SCA vulnerabilities. INFO:cartography.intel.semgrep.findings:Retrieved X Semgrep SCA vulns in X pages. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA vulns info into the graph. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA usages info into the graph. INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #1 ... INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #X INFO:cartography.graph.job:Finished job semgrep_sca_risk_analysis INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #X INFO:cartography.graph.job:Finished job SemgrepSCAFinding INFO:cartography.intel.semgrep.findings:Running Semgrep SCA Locations cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #X INFO:cartography.graph.job:Finished job SemgrepSCALocation INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1730497895' ``` </details> <details> <summary>Logs from semgrep module after these changes</summary> ``` INFO:cartography.sync:Starting sync with update tag '1730505324' INFO:cartography.sync:Starting sync stage 'create-indexes' INFO:cartography.intel.create_indexes:Creating indexes for cartography node types. INFO:cartography.sync:Finishing sync stage 'create-indexes' INFO:cartography.sync:Starting sync stage 'semgrep' INFO:cartography.intel.semgrep.deployment:Loading Semgrep deployment info {'id': ...} into the graph... INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job. INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep dependencies. ... INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Go Library cleanup job. INFO:cartography.graph.statement:Completed GoLibrary statement #1 ... INFO:cartography.graph.statement:Completed GoLibrary statement #X INFO:cartography.graph.job:Finished job GoLibrary INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. INFO:cartography.intel.semgrep.findings:Retrieving Semgrep SCA vulns for deployment 'lyft'. INFO:cartography.intel.semgrep.findings:Processed page 0 of Semgrep SCA vulnerabilities. ... INFO:cartography.intel.semgrep.findings:Processed page X of Semgrep SCA vulnerabilities. INFO:cartography.intel.semgrep.findings:Retrieved X Semgrep SCA vulns in X pages. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA vulns info into the graph. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA usages info into the graph. INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #1 ... INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #X INFO:cartography.graph.job:Finished job semgrep_sca_risk_analysis INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #X INFO:cartography.graph.job:Finished job SemgrepSCAFinding INFO:cartography.intel.semgrep.findings:Running Semgrep SCA Locations cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #X INFO:cartography.graph.job:Finished job SemgrepSCALocation INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1730497895' ``` </details> ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [x] Include a screenshot showing what the graph looked like before and after your changes. - [x] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [x] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [x] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). ### TODO - [ ] Clean up TODO comments in code - [ ] Add/update files like cartography/data/jobs/scoped_analysis/semgrep_sca_risk_analysis.json? --------- Signed-off-by: Hans Wernetti <[email protected]> Signed-off-by: Alex Chantavy <[email protected]> Co-authored-by: Alex Chantavy <[email protected]> Signed-off-by: chandanchowdhury <[email protected]>

### Summary Small followup to cartography-cncf#1368, noticed that the schema doc is not being rendered correctly: ![image](https://github.com/user-attachments/assets/e3f34e94-a7c0-4b39-b243-460fe39acb57) I'm not sure how to test that it will render correctly after this change, but this format matches the existing format used by the github schema: https://github.com/cartography-cncf/cartography/blob/f11d7b2ff87331ed736a5de2547cde812face1a0/docs/root/modules/github/schema.md?plain=1#L210-L218 --------- Signed-off-by: Hans Wernetti <[email protected]> Signed-off-by: chandanchowdhury <[email protected]>

…ncf#1385) ### Summary This PR adds support to ingest dependencies from Semgrep for the NPM ecosystem, as well as introducing a CLI flag allowing users to specify which ecosystems to ingest. ### Related issues or links cartography-cncf#1368 added support for ingesting dependencies from Semgrep (only for the `gomod` ecosystem) ### Demo Before these changes, a project with both Go and NPM dependencies will only have GoLibrary nodes in the dependency graph: <img width="1036" alt="image" src="https://github.com/user-attachments/assets/31d97626-be70-4c80-9a5b-71c26056a53b"> After these changes, for the same project the graph contains both GoLibrary and NpmLibrary nodes: <img width="1039" alt="image" src="https://github.com/user-attachments/assets/d09cc265-ccd6-463e-bd01-2b3e7c6d1778"> <details> <summary>Logs from semgrep module before these changes</summary> ``` INFO:cartography.sync:Starting sync stage 'semgrep' INFO:cartography.intel.semgrep.deployment:Loading Semgrep deployment info {'id': ...} into the graph... INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job. INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep dependencies. ... INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Go Library cleanup job. INFO:cartography.graph.statement:Completed GoLibrary statement #1 ... INFO:cartography.graph.statement:Completed GoLibrary statement #X INFO:cartography.graph.job:Finished job GoLibrary INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. ... INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1730497895' ``` </details> <details> <summary>Logs from semgrep module after these changes</summary> ``` INFO:cartography.intel.semgrep.deployment:Loading SemgrepDeployment {'id': ...} into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job. INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep gomod dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep gomod dependencies. INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep gomod dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep gomod dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Dependencies cleanup job for GoLibrary. INFO:cartography.graph.statement:Completed GoLibrary statement #1 INFO:cartography.graph.statement:Completed GoLibrary statement cartography-cncf#2 INFO:cartography.graph.statement:Completed GoLibrary statement cartography-cncf#3 INFO:cartography.graph.job:Finished job GoLibrary INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep npm dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep npm dependencies. ... INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep npm dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep npm dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X NpmLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Dependencies cleanup job for NpmLibrary. INFO:cartography.graph.statement:Completed NpmLibrary statement #1 INFO:cartography.graph.statement:Completed NpmLibrary statement cartography-cncf#2 INFO:cartography.graph.statement:Completed NpmLibrary statement cartography-cncf#3 INFO:cartography.graph.job:Finished job NpmLibrary INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. ... INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1731969699' ``` </details> ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [x] Include a screenshot showing what the graph looked like before and after your changes. - [x] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [x] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [x] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). --------- Signed-off-by: Hans Wernetti <[email protected]> Signed-off-by: chandanchowdhury <[email protected]>

hanzo changed the title ~~[WIP] Ingest dependencies using Semgrep API~~ [WIP] Ingest Go dependencies using Semgrep API Oct 21, 2024

hanzo commented Oct 22, 2024

View reviewed changes

hanzo force-pushed the semgrep-dependencies branch from f64769b to 13db598 Compare October 29, 2024 18:35

hanzo commented Oct 29, 2024

View reviewed changes

cartography/intel/semgrep/__init__.py Outdated Show resolved Hide resolved

hanzo commented Oct 29, 2024

View reviewed changes

cartography/intel/semgrep/deployment.py Show resolved Hide resolved

hanzo commented Oct 29, 2024

View reviewed changes

cartography/intel/semgrep/__init__.py Outdated Show resolved Hide resolved

hanzo and others added 9 commits October 29, 2024 11:44

Ingest dependencies using Semgrep API

defe865

Signed-off-by: Hans Wernetti <[email protected]>

rm

eb79a6a

Signed-off-by: Hans Wernetti <[email protected]>

basic ingestion working

dadaa2a

Signed-off-by: Hans Wernetti <[email protected]>

add data models and parameterize functions

d549c02

Signed-off-by: Hans Wernetti <[email protected]>

remove manual indices

ed0573d

Signed-off-by: Hans Wernetti <[email protected]>

cleanup

0cc44a3

Signed-off-by: Hans Wernetti <[email protected]>

Add CNCF to docs (#1369)

6f995fc

### Summary > Describe your changes. Now that cartography has been donated to the CNCF, time to update the docs Signed-off-by: Alex Chantavy <[email protected]> Signed-off-by: Hans Wernetti <[email protected]>

0.95.0rc1 (#1370)

8cdc540

First release after CNCF donation, lets see if the CI tooling works Signed-off-by: Hans Wernetti <[email protected]>

refactor sync functions into separate files

cdbd8cf

Signed-off-by: Hans Wernetti <[email protected]>

hanzo force-pushed the semgrep-dependencies branch from 13db598 to cdbd8cf Compare October 29, 2024 18:44

hanzo added 6 commits October 29, 2024 11:45

Merge branch 'master' into semgrep-dependencies

6ba5000

fix test

681d7af

Signed-off-by: Hans Wernetti <[email protected]>

test fix

d71ed6c

Signed-off-by: Hans Wernetti <[email protected]>

move slug to common params

6a7926e

Signed-off-by: Hans Wernetti <[email protected]>

add sync_deployment method

9000aaa

Signed-off-by: Hans Wernetti <[email protected]>

better warnings

55e4e04

Signed-off-by: Hans Wernetti <[email protected]>

achantavy reviewed Oct 31, 2024

View reviewed changes

cartography/intel/semgrep/deployment.py Outdated Show resolved Hide resolved

hanzo added 6 commits October 31, 2024 15:38

move deployment test to separate file

3250dd9

Signed-off-by: Hans Wernetti <[email protected]>

rm unused import

3164bcf

Signed-off-by: Hans Wernetti <[email protected]>

undo test changes

53c5e0d

Signed-off-by: Hans Wernetti <[email protected]>

move test functions to common.py

e58ddaf

Signed-off-by: Hans Wernetti <[email protected]>

refactor tests, start on deps tests

61bffe8

Signed-off-by: Hans Wernetti <[email protected]>

tests

6a7985b

Signed-off-by: Hans Wernetti <[email protected]>

hanzo added 5 commits November 1, 2024 11:37

tweak test

feeab74

Signed-off-by: Hans Wernetti <[email protected]>

rename test

4276faf

Signed-off-by: Hans Wernetti <[email protected]>

add back create_dependency_nodes

ea638fd

Signed-off-by: Hans Wernetti <[email protected]>

fix test

6f44fbd

Signed-off-by: Hans Wernetti <[email protected]>

add specifier property

0daf3ec

Signed-off-by: Hans Wernetti <[email protected]>

hanzo commented Nov 1, 2024

View reviewed changes

hanzo changed the title ~~[WIP] Ingest Go dependencies using Semgrep API~~ Ingest Go dependencies using Semgrep API Nov 1, 2024

hanzo marked this pull request as ready for review November 1, 2024 21:03

hanzo added 3 commits November 1, 2024 14:38

update schema

ce2c15a

Signed-off-by: Hans Wernetti <[email protected]>

rename var

c31b1d6

Signed-off-by: Hans Wernetti <[email protected]>

Merge branch 'master' into semgrep-dependencies

edf7c59

hanzo commented Nov 4, 2024

View reviewed changes

achantavy previously approved these changes Nov 4, 2024

View reviewed changes

address review feedback

aa4a403

Signed-off-by: Hans Wernetti <[email protected]>

hanzo dismissed achantavy’s stale review via aa4a403 November 5, 2024 18:15

olivia-hong reviewed Nov 5, 2024

View reviewed changes

achantavy approved these changes Nov 5, 2024

View reviewed changes

achantavy merged commit 5029b00 into master Nov 5, 2024
5 checks passed

achantavy deleted the semgrep-dependencies branch November 5, 2024 18:53

hanzo mentioned this pull request Nov 5, 2024

Fix Semgrep schema doc #1377

Merged

hanzo mentioned this pull request Nov 19, 2024

Ingest NPM dependencies using Semgrep API (v0.96.0rc2) #1385

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest Go dependencies using Semgrep API #1368

Ingest Go dependencies using Semgrep API #1368

hanzo commented Oct 15, 2024 •

edited

Loading

hanzo Oct 22, 2024

heryxpc Oct 23, 2024

hanzo Oct 23, 2024

hanzo Oct 29, 2024

hanzo Oct 30, 2024

hanzo Oct 29, 2024 •

edited

Loading

hanzo Nov 1, 2024

hanzo Nov 4, 2024

achantavy left a comment •

edited

Loading

achantavy Oct 31, 2024

hanzo Nov 5, 2024

achantavy Nov 4, 2024

hanzo Nov 5, 2024 •

edited

Loading

hanzo Nov 5, 2024

achantavy Nov 4, 2024

hanzo Nov 5, 2024

olivia-hong left a comment

	semgrep_deployment = get_deployment(semgrep_app_token)
	deployment_id = semgrep_deployment["id"]
	deployment_slug = semgrep_deployment["slug"]
	load_semgrep_deployment(neo4j_sesion, semgrep_deployment, update_tag)

Ingest Go dependencies using Semgrep API #1368

Ingest Go dependencies using Semgrep API #1368

Conversation

hanzo commented Oct 15, 2024 • edited Loading

Summary

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanzo Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achantavy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanzo Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olivia-hong left a comment

Choose a reason for hiding this comment

hanzo commented Oct 15, 2024 •

edited

Loading

hanzo Oct 29, 2024 •

edited

Loading

achantavy left a comment •

edited

Loading

hanzo Nov 5, 2024 •

edited

Loading