Implement crawler.teardown (exists in JS version) #651
Labels
enhancement
New feature or request.
t-tooling
Issues with this label are in the ownership of the tooling team.
Implement some way to stop crawler in obvious and controlled way from the user function. It should properly shutdown all resources and immediately stop crawler to send any requests. It should be mirroring the JS version.
Use case:
User wants to stop crawler from within the user function.
Example of current workarounds for user:
Add flag at the beginning of the user function and shortcut user function evaluation.
if finished:
return
...
Drawback: Currently queued requests are still being send, but not processed.
Call some private internals:
await crawler._pool.abort()
Drawback: Internal. Remaining tasks will still finish.
Drop request provider
await request_provider.drop()
Drawback: Bunch of errors as existing tasks might still try to access request_provider()
Example of how this is solved in scrapy:
https://docs.scrapy.org/en/2.11/faq.html#how-can-i-instruct-a-spider-to-stop-itself
The text was updated successfully, but these errors were encountered: