Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Ingest Go dependencies using Semgrep API (#1368)
### Summary Motivation: The library dependency graph is already populated with `PythonLibrary` nodes via Cartography's Github module, but no other languages are supported. This PR adds `GoLibrary` nodes to the library dependency graph to bring support for Go up to parity with Python. I concur with the recommendation from @heryxpc that rather than writing code to manually parse go.mod files, we should instead use the dependency data returned by the Semgrep API. Cartography's Semgrep module is already able to import supply chain vulnerability data from the [Findings](https://semgrep.dev/api/v1/docs/#tag/Finding/operation/semgrep_app.core_exp.findings.handlers.issue.openapi_list_recent_issues) endpoint of the Semgrep API. Semgrep also provides a [List Dependencies](https://semgrep.dev/api/v1/docs/#tag/SupplyChainService/operation/semgrep_app.products.sca.handlers.dependency.list_dependencies_conexxion) endpoint that returns a list of every known dependency for a given ecosystem (e.g. specifying the “gomod” ecosystem returns all dependencies found in go.mod files). The response contains useful information including the transitivity of the dependency and a link to where it’s defined in source code. The dependency nodes imported from the Semgrep API will be labelled `GoLibrary::SemgrepDependency::Dependency` and will match the properties of existing `PythonLibrary::Dependency` nodes as closely as possible. This PR only imports Go dependencies from Semgrep, but I've structured the code to make it easy to import additional languages from Semgrep in the future. Before these changes, a project with both Python and Go dependencies will only have PythonLibrary nodes in the dependency graph: <img width="1019" alt="image" src="https://github.com/user-attachments/assets/9e291012-103e-4dae-a2bb-2da5205421b7"> After these changes, for the same project the graph contains both PythonLibrary and GoLibrary nodes: <img width="1015" alt="image" src="https://github.com/user-attachments/assets/f945e489-6a3e-4edf-85d4-424bacd763b2"> <details> <summary>Logs from semgrep module before these changes</summary> ``` INFO:cartography.sync:Starting sync with update tag '1730497895' INFO:cartography.sync:Starting sync stage 'create-indexes' INFO:cartography.intel.create_indexes:Creating indexes for cartography node types. INFO:cartography.sync:Finishing sync stage 'create-indexes' INFO:cartography.sync:Starting sync stage 'semgrep' INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. INFO:cartography.intel.semgrep.findings:Loading Semgrep deployment info {'id': ...} into the graph... INFO:cartography.intel.semgrep.findings:Retrieving Semgrep SCA vulns for deployment 'X'. INFO:cartography.intel.semgrep.findings:Processed page 0 of Semgrep SCA vulnerabilities. ... INFO:cartography.intel.semgrep.findings:Processed page X of Semgrep SCA vulnerabilities. INFO:cartography.intel.semgrep.findings:Retrieved X Semgrep SCA vulns in X pages. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA vulns info into the graph. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA usages info into the graph. INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #1 ... INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #X INFO:cartography.graph.job:Finished job semgrep_sca_risk_analysis INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #X INFO:cartography.graph.job:Finished job SemgrepSCAFinding INFO:cartography.intel.semgrep.findings:Running Semgrep SCA Locations cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #X INFO:cartography.graph.job:Finished job SemgrepSCALocation INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1730497895' ``` </details> <details> <summary>Logs from semgrep module after these changes</summary> ``` INFO:cartography.sync:Starting sync with update tag '1730505324' INFO:cartography.sync:Starting sync stage 'create-indexes' INFO:cartography.intel.create_indexes:Creating indexes for cartography node types. INFO:cartography.sync:Finishing sync stage 'create-indexes' INFO:cartography.sync:Starting sync stage 'semgrep' INFO:cartography.intel.semgrep.deployment:Loading Semgrep deployment info {'id': ...} into the graph... INFO:cartography.intel.semgrep.dependencies:Running Semgrep dependencies sync job. INFO:cartography.intel.semgrep.dependencies:Retrieving Semgrep dependencies for deployment 'X'. INFO:cartography.intel.semgrep.dependencies:Processed page 0 of Semgrep dependencies. ... INFO:cartography.intel.semgrep.dependencies:Processed page X of Semgrep dependencies. INFO:cartography.intel.semgrep.dependencies:Retrieved X Semgrep dependencies in X pages. INFO:cartography.intel.semgrep.dependencies:Loading X GoLibrary objects into the graph. INFO:cartography.intel.semgrep.dependencies:Running Semgrep Go Library cleanup job. INFO:cartography.graph.statement:Completed GoLibrary statement #1 ... INFO:cartography.graph.statement:Completed GoLibrary statement #X INFO:cartography.graph.job:Finished job GoLibrary INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings sync job. INFO:cartography.intel.semgrep.findings:Retrieving Semgrep SCA vulns for deployment 'lyft'. INFO:cartography.intel.semgrep.findings:Processed page 0 of Semgrep SCA vulnerabilities. ... INFO:cartography.intel.semgrep.findings:Processed page X of Semgrep SCA vulnerabilities. INFO:cartography.intel.semgrep.findings:Retrieved X Semgrep SCA vulns in X pages. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA vulns info into the graph. INFO:cartography.intel.semgrep.findings:Loading X Semgrep SCA usages info into the graph. INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #1 ... INFO:cartography.graph.statement:Completed semgrep_sca_risk_analysis statement #X INFO:cartography.graph.job:Finished job semgrep_sca_risk_analysis INFO:cartography.intel.semgrep.findings:Running Semgrep SCA findings cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCAFinding statement #X INFO:cartography.graph.job:Finished job SemgrepSCAFinding INFO:cartography.intel.semgrep.findings:Running Semgrep SCA Locations cleanup job. INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #1 ... INFO:cartography.graph.statement:Completed SemgrepSCALocation statement #X INFO:cartography.graph.job:Finished job SemgrepSCALocation INFO:cartography.sync:Finishing sync stage 'semgrep' INFO:cartography.sync:Finishing sync with update tag '1730497895' ``` </details> ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [x] Include a screenshot showing what the graph looked like before and after your changes. - [x] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [x] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [x] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). ### TODO - [ ] Clean up TODO comments in code - [ ] Add/update files like cartography/data/jobs/scoped_analysis/semgrep_sca_risk_analysis.json? --------- Signed-off-by: Hans Wernetti <[email protected]> Signed-off-by: Alex Chantavy <[email protected]> Co-authored-by: Alex Chantavy <[email protected]>
- Loading branch information