-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too-many-redirects error for BBC News site, perhaps due to revisit records #21
Comments
Hah, OpenWayback is also having trouble, but returns a http://192.168.45.25:8081/qa-access/20180228215703/https://www.bbc.co.uk/news/ Because the first request, for http://192.168.45.25:8081/qa-access/20180228215703mp_/https://www.bbc.co.uk/news/ Gets redirected to: http://192.168.45.25:8081/qa-access/20180228215703mp_/http://www.bbc.co.uk/news And the looping starts. Any idea what a properly-indexed version should look like!? The |
I found an older issue indicating that the to be clear, if I go to direct to the not- http://192.168.45.25:8081/qa-access/20180126120357/http://www.bbc.co.uk/news and if I go to other instances (different html) it works. So, we seem to have a mixture of slash-to-no-slash redirects and HTML responses for the same URL key (depending on whether the original url was |
Turns out the issue was likely caused by the pywb 'self-redirect' check not running, due to status code being set to '0' by the XmlQuery CDX. Changing the self-redirect check to run whenever status code is not 2xx, 4xx, 5xx instead should catch this case |
…es not start with 2, 4, 5, to more aggressively check invalid status codes, should fix ukwa/ukwa-pywb#21
On our production APIs, I visit:
http://192.168.45.25:8081/qa-access/20180228215703/http://www.bbc.co.uk/news
I end up in a loop of requests:
But the response is:
This appears to happen for
revisit
records, because if I go to the datestamps for other records playback works.In this case, we have this record in the CDX server:
and scrolling back quite a lot, the corresponding
response
record, based ondigest
:I'm guessing that
pywb
is going back far enough to find the record, so maybe this is a problem with the way we populated our CDX index?The text was updated successfully, but these errors were encountered: