-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whois tool dies frequently #21
Comments
Two main questions I have. In what context the tool dies? (I'm guessing that it's in the middle of a large number of automated requests, but I'm not entirely sure.) Is https://whois-dev.toolforge.org/ a good replacement? |
Thanks for investigating this! Whois-dev's API appears to output JSON data with the correct MIME type, but all my requests to it are marked by Firefox as slower than expected. Otherwise, the dev version looks fine. |
whois-dev gets a high number of requests now (>20000 per day), and it appears that it dies in proportion.
Rate-limiting seems like one way to mitigate this (assuming the number of users, as opposed to the number of requests, is not too large). Fortunately it looks like there is a ready-made implementation of rate limiting for flask. An alternative might be to ask for more resource from the toolforge infrastructure: https://phabricator.wikimedia.org/T245426 . |
Rate-limiting by IP is not possible on Toolforge, as IP information is intentionally stripped by the reverse proxy. It should be possible to collect user agent information though ( |
@AntiCompositeNumber Thank you for the suggestion. I'm trying that. It turns out there were unusually slow queries.
I don't know the root cause, but I know they are rare, well below 1%. I'm trying adding timeouts. |
I am not sure whether or not adjust lighttpd config settings could help out here. You might try to limit the load using [Edit] you can also restrict the load using attributes I am a lighttpd developer so if you have other suggestions, please let me know. For example, the upcoming lighttpd 1.4.60 will include |
@gstrauss Thank you for the advice. I was wondering why those timeout parameters mentioned in the official lighttpd documentation didn't work, without bothering to check the version I was dealing with. It turns out that it was lighttpd 1.4.53, so that's why. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd has |
Separate from those two configuration options is the |
Looking at the logs for ASNBlock, I am now getting 408 Request Timeout and 429 Too Many Requests errors from the whois-dev API. What ratelimit should I use? |
@AntiCompositeNumber Currently no more than 60 requests per minute is allowed. It is kind of arbitrary and can be relaxed depending on the needs, but I think there should be some limit. |
Currently set to 1/sec, can tune if that still hits the ratelimit. utils.Throttle is not concurrency-aware, but with WHOIS result caching implemented only one of the processes is making the bulk of requests. See whym/whois-gateway#21
I tried some different values for listen-backlog ( |
What are you trying to rate limit? I don't think the configuration settings your trying will help much. Should the PHP keep a database of requests and timestamps and limit the number of requests within a given minute, or else return |
Part of the reason must be the unusually slow queries mentioned in #21 (comment) . It doesn't happen with the raw whois command but happens with the ipwhois library of Python consistently, both for the CGI version and the flask version. I added timeouts to the flask version, and that seems to have made the service mostly downtime free. However, this is ad hoc. Perhaps I can revisit RDAP for faster responses but it had its own issues. (#5) |
https://whois.toolforge.org/ has been unstable for weeks. It dies a couple of times a day, although usually it gets restarted after a while of unavailability.
Here is what appears like relevant part of the error messages: errors.txt
(The linked file contains a larger portion from the log file.)
Here is my tentative plan. Coincidentally I have been preparing a flask version of the tool inspired by @wiki-ST47 's fork, and it's ready to be tested. ( #20 ) If it serves automated traffic to the JSON endpoint well (and assuming that's the cause), the new version could be the solution.
I don't know what is happening from the logs above, so the replacement might or might not solve it. If there is an identifiable cause, I can work on it (and work on the switch independently, perhaps later).
The text was updated successfully, but these errors were encountered: