Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement crawler.teardown (exists in JS version) #651

Open
Pijukatel opened this issue Nov 5, 2024 · 1 comment
Open

Implement crawler.teardown (exists in JS version) #651

Pijukatel opened this issue Nov 5, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@Pijukatel
Copy link
Contributor

Pijukatel commented Nov 5, 2024

Implement some way to stop crawler in obvious and controlled way from the user function. It should properly shutdown all resources and immediately stop crawler to send any requests. It should be mirroring the JS version.

Use case:
User wants to stop crawler from within the user function.

Example of current workarounds for user:

  1. Add flag at the beginning of the user function and shortcut user function evaluation.
    if finished:
    return
    ...
    Drawback: Currently queued requests are still being send, but not processed.

  2. Call some private internals:
    await crawler._pool.abort()
    Drawback: Internal. Remaining tasks will still finish.

  3. Drop request provider
    await request_provider.drop()
    Drawback: Bunch of errors as existing tasks might still try to access request_provider()

Example of how this is solved in scrapy:
https://docs.scrapy.org/en/2.11/faq.html#how-can-i-instruct-a-spider-to-stop-itself

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Nov 5, 2024
@vdusek vdusek added the enhancement New feature or request. label Nov 5, 2024
@janbuchar
Copy link
Collaborator

This has been discussed in #506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

3 participants