Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: purl2sym: Expand support for multiple tools #313

Closed
12 tasks done
pombredanne opened this issue Feb 29, 2024 · 1 comment
Closed
12 tasks done

Feature: purl2sym: Expand support for multiple tools #313

pombredanne opened this issue Feb 29, 2024 · 1 comment
Assignees

Comments

@pombredanne
Copy link
Member

pombredanne commented Feb 29, 2024

This is to extend the ScanCode.io pipeline in #312 to support multiple tools, including two pure Python tools, two native tools, as well as collecting symbols from OpenSSL, Busybox, and uCLibc. This includes updating the scanning architecture of PurlDB to accommodate multiple ScanCode.io worker systems (whole machines) and expose a queue API where ScanCode.io instances can pick a scanning job to run symbol collections. A new API endpoint will accept a PURL and return a list of all known PURLs and download URL for all versions of the PURL, and another will provide status on the symbols collections that are pending.

@pombredanne pombredanne converted this from a draft issue Feb 29, 2024
@keshav-space keshav-space moved this to In Progress in 04-purl2sym Apr 3, 2024
@keshav-space
Copy link
Member

keshav-space commented May 1, 2024

We now support native tools like Universal-ctag and xgettext for symbol and string collection. We also support Python tool Pygments and tree-sitter.

The simple way to test this is to follow the below steps after installing PurlDB (https://github.com/nexB/purldb?tab=readme-ov-file#installation):

  1. Go to /api/collect/ add PURL for indexing. ex: /api/collect/?purl=pkg:generic/[email protected]&addon_pipelines={TOOL_PIPELINE}
  2. Once the indexing has completed go to /api/resources/and filter out resource based on PURL to see the symbols and strings for each resource in extra_data field. ex: /api/resources/?purl=pkg:generic/[email protected]

Replace the {TOOL_PIPELINE} in above request with your desired Pipeline from the table shown below.

Pipeline Tool Output
collect_symbols_ctags Universal Ctags symbols
collect_strings_gettext Gettext string literals
collect_symbols_pygments Pygments symbols, string literals and comments
collect_symbols_tree_sitter Tree-Sitter symbols and string literals

Tip

Follow this tutorial to index the symbols and strings for a PURL/Package using various different addon pipelines. https://aboutcode.readthedocs.io/projects/PURLdb/en/latest/how-to-guides/tutorial_symbol_and_string_collection.html

@github-project-automation github-project-automation bot moved this from In Progress to Done in 04-purl2sym May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants