-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solar.lowtechmagazine.com is very unstable #283
Comments
Diving a bit deeper into the third execution failure above, I've found that it is probably only a slightly different occurrence of #266
|
Remember this server is a solar-powered installation with a backup battery. I noticed that our scrapes always seem to be running when it's around 25% battery (coincidence?). Maybe this is irrelevant, but it's not a fast server. Any way to slow the crawl down so it doesn't overwhelm the server? Not sure if this would help, but it's a guess... |
Thank you for the suggestion, I had this in mind as well. I doubt there is a link because here the logs says that the problem is not a timeout while loading the page (which would indicate an upstream server issue) but a page crash or a new window timeout, meaning an issue with the brave browser on the scraper. I will keep this in mind anyway. |
Looks like Browsertrix crawler 1.0 might have enhanced the situation |
I confirm that issue seems to be gone, I ran the recipe many times recently without any problem. Last runs produced quite a smaller ZIM but there is no significant message in the logs about pages which might be missing (143 links are broken on the solar.lowtechmagazine.com domain, but the ones I randomly checked are really not available online, so this is "normal"). |
In addition to #266 issue which has often been encountered on solar.lowtechmagazine.com recipe, the situation became clear the situation is even worse with last ZIM update I tried to perform which failed in very various manners.
All executions mentioned below ran with
First execution (which ran on athena18) succeeded from Zimfarm perspective, but the ZIM was only 53.61 MB instead of about 300-310 MB.
Second execution (which ran on athena18) failed with a new occurrence of #266 (see #266 (comment) for new details I found out).
Third execution (which ran on athena18) succeeded from Zimfarm perspective, but the ZIM was only 63.57 MB instead of about 300-310 MB.
Fourth execution (which ran on ondemand) succeeded to produce a ZIM of about 300-310 MB. We still have 1388 failed pages (out of 2779), but we do not have any "Page crashed" message in the log. Looking at few failed pages, it looks like real issues (invalid href links) in the source website.
The text was updated successfully, but these errors were encountered: