Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan queue updates #285

Closed
6 of 7 tasks
JonoYang opened this issue Feb 7, 2024 · 4 comments
Closed
6 of 7 tasks

Scan queue updates #285

JonoYang opened this issue Feb 7, 2024 · 4 comments
Assignees

Comments

@JonoYang
Copy link
Member

JonoYang commented Feb 7, 2024

We are revamping the scan queue. The new process will have scancode.io workers get jobs from the purldb by visiting an API endpoint that returns the next available scan queue entry to be worked on. (api/scan_queue/get_next_download_url). When the worker is done with the scan, it will post the results back to api/scan_queue/submit_scan_results.

TODO:

  • Guard indexing endpoints with special API key
  • Update scan queue to create new scan queue entries whenever we reindex a package, rather than reusing existing entry
  • Create scancode.io management command that gets jobs from api/scan_queue/get_next_download_url and sends results back
    • Get download url
    • Create project, get input, run pipeline
    • Return results

Consider:

  • POST scancode io worker info before getting download url so we can tell which worker took what job
@JonoYang JonoYang self-assigned this Feb 7, 2024
@404-geek
Copy link
Contributor

404-geek commented Feb 13, 2024

Hi @JonoYang , I would like to work on this specific issue if not completed, and would like your guidance to understand some parts of it.

@JonoYang
Copy link
Member Author

@404-geek Thanks for your interest in the issue, but I'm currently working on this.

JonoYang added a commit that referenced this issue Feb 14, 2024
JonoYang added a commit that referenced this issue Feb 15, 2024
JonoYang added a commit that referenced this issue Feb 16, 2024
    * Create test for ScannableURI API

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 16, 2024
JonoYang added a commit that referenced this issue Feb 17, 2024
JonoYang added a commit that referenced this issue Feb 17, 2024
JonoYang added a commit that referenced this issue Feb 17, 2024
    * Create test for ScannableURI API

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 17, 2024
JonoYang added a commit that referenced this issue Feb 17, 2024
JonoYang added a commit that referenced this issue Feb 20, 2024
JonoYang added a commit that referenced this issue Feb 20, 2024
JonoYang added a commit that referenced this issue Feb 20, 2024
    * Create test for ScannableURI API

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 20, 2024
JonoYang added a commit that referenced this issue Feb 20, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
    * Update tests

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 22, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
    * Create test for ScannableURI API

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 22, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
JonoYang added a commit that referenced this issue Feb 22, 2024
    * Update tests

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 24, 2024
    * Use flot to build matchcode-toolkit
    * Bump version and update CHANGELOG.rst

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 24, 2024
Update scan_and_fingerprint_package pipeline #49 #285
JonoYang added a commit that referenced this issue Feb 27, 2024
JonoYang added a commit that referenced this issue Feb 27, 2024
JonoYang added a commit that referenced this issue Feb 27, 2024
    * Create test for ScannableURI API

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 27, 2024
JonoYang added a commit that referenced this issue Feb 27, 2024
JonoYang added a commit that referenced this issue Feb 27, 2024
JonoYang added a commit that referenced this issue Feb 27, 2024
    * Update tests

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Feb 29, 2024
JonoYang added a commit that referenced this issue Mar 5, 2024
    * Guard scan_queue API endpoint

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Mar 7, 2024
JonoYang added a commit that referenced this issue Mar 7, 2024
JonoYang added a commit that referenced this issue Mar 8, 2024
JonoYang added a commit that referenced this issue Mar 18, 2024
JonoYang added a commit that referenced this issue Mar 18, 2024
JonoYang added a commit that referenced this issue Mar 18, 2024
    * Add tests for validate_uuid
    * Test for missing scan_status in update_status

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Mar 18, 2024
JonoYang added a commit that referenced this issue Mar 19, 2024
JonoYang added a commit that referenced this issue Mar 19, 2024
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Mar 19, 2024
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Mar 19, 2024
Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Mar 19, 2024
JonoYang added a commit that referenced this issue Mar 19, 2024
JonoYang added a commit that referenced this issue Mar 19, 2024
JonoYang added a commit that referenced this issue Mar 19, 2024
JonoYang added a commit that referenced this issue Mar 19, 2024
@404-geek
Copy link
Contributor

Hi @JonoYang Is the above issue related to

https://github.com/nexB/aboutcode/wiki/GSOC-2024-Project-Ideas/#purldb-add-improved-scan-queue-with-multiple-scio-instances

because I was thinking to have a actual queue system implemented using message brokers or Redis Queue and keeping the API pipeline separate which can bring scale in between the communication of the services.

@JonoYang
Copy link
Member Author

This has been completed. The scan queue has now been updated such that the ScanCode.io workers get the next package and pipelines to be run on the package from the new API endpoint /api/scan_queue/get_next_download_url/. These are the default pipelines run on a Package: https://github.com/nexB/purldb/blob/main/minecode/model_utils.py#L30 . The ScanCode.io worker then fetches the package, runs the specified pipelines, and then send the results back to PurlDB to the /api/scan_queue/update_status/ endpoint.

Relevant PRs:

To test this feature:

  • Deploy the main branch of PurlDB
  • Create a PurlDB API key for the ScanCode.io worker using the management command create-scan-queue-worker-user
    • docker compose -f docker-compose_purldb. exec web bash
    • python manage_purldb.py create-scan-queue-worker-user <username>
  • Deploy this branch of ScanCode.io (https://github.com/nexB/scancode.io/tree/update-docker-compose). In the .env file, set the PurlDB integration environment variables to the PurlDB instance deployed earlier, using the API key generated.
  • Run the ScanCode.io scan queue worker by running docker compose -f docker-compose.purldb-scan-queue-worker.yml up
  • Add some Packages to the PurlDB using the /api/packages/index_packages/ endpoint
  • The scan queue worker should be fetching and scanning the packages from the scan queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants