Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics.state.requestsRetries only updates after request is fully handled #2732

Open
1 task
metalwarrior665 opened this issue Nov 1, 2024 · 0 comments
Open
1 task
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@metalwarrior665
Copy link
Member

Which package is this bug report for? If unsure which one to select, leave blank

None

Issue description

The requestsRetries statistics are only updated once the request is fully handled (succeeds or fails). This creates a false notion that requests are not erroring as much. The problem is made worse by that after error, requests are pushed at the end of the queue which means for a long time, there will be requestsRetries: 0 while significant requestsFinished. At the end of the Crawler.run, the retries will show correct number if all the requests were finished but imagine if the crawler gets .teardown or crashes.

I don't see any use-case for having the retries added only once handled so I consider this a bug.

See also run on Apify https://console.apify.com/view/runs/cbHWZTetxgehJuK3b

Code sample

const crawler = new CheerioCrawler({
    requestHandler: async ({ crawler, request }) => {
        const { requestsRetries } = crawler.stats.state;
        const { requestRetryHistogram } = crawler.stats;
        log.info(`${request.url}: Crawler retry count: ${requestsRetries}, Request retry histogram: ${JSON.stringify(requestRetryHistogram)}`);

        if (Math.random() < 0.5) {
            throw new Error('50% Random error');
        }
    },
});

const requests = [];
for (let i = 0; i < 100; i++) {
    requests.push({ url: `https://www.example.com/${i}` });
}
await crawler.run(requests);

Package version

3.11.5

Node.js version

20.12.2

Operating system

No response

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@metalwarrior665 metalwarrior665 added the bug Something isn't working. label Nov 1, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

1 participant