-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve indexing queue with multiple instances of ScanCode.io #236
Comments
See also #14 |
There are two possible designs:
|
An idea on how to delegate work to workers: There would be a new API endpoint that we make it only accessible to workers. When called, this API will return the download URL and uuid for the package scan request. The worker will then download the Package at the given download URL and scan it. When the scan is complete, the worker will then send a POST request containing the completed scan to the server. |
For historical reference we considered these other ideas, but this may not be the actual design:
|
This is done. We now have
To test this feature:
|
We should improve the PurlDB architecture to use not one but multiple ScanCode.io workers. The current PurlDB setup is to have only one dedicated ScanCode.io instance as a worker to perform fingerprinting for the PurlDB as needed at indexing time. The PurlDB calls this ScanCode.io when needed. This does not scale well and is a seriously limiting factor for indexing. We need instead to implement an alternative design.
The original design is in:
The high level solution could cover:
Expose new scan queue to many workers #49 #290
Create purldb scan worker scancode.io#1078
The text was updated successfully, but these errors were encountered: