-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pywb failing to handle self-redirects from OutbackCDX #865
Comments
Thank you for the excellent bug report, with the pywb version dependence. I can't access the slack warc file because my org isn't a member. I only have guest access. |
We experience the same issue with OutbackCDX v. 0.11.1 and PyWb v. 2.7.4. Redirects result in "Not found". |
@wumpus unfortunately the warc was too big to attach here. Happy to share it with you another way if you'd like it |
I see Ilya has been assigned by Tessa and I know he does have access to the IIPC Slack. So it's in good hands. |
We also ran into this issue recently. We use PyWb 2.83. We checked with 2.6.9, there it worked. Are there any plan to fix this? |
Echoing same error as well with PyWb 2.83 and OutbackCDX 1.0.0 |
OutbackCDX has a partial workaround for this. If you run it with the --omit-self-redirects command-line option (or pass omitSelfRedirects=true in the query string) it will try to use the CDX redirect field to detect self redirects and hide them. Unfortunately pywb's cdx-indexer and webrecorder/cdxj-indexer don't populate the redirect field though so if you used them to build your indexes this workaround won't help you. Without the redirect field populated there's no way for OutbackCDX to detect self redirects. (For reference we use jwarc for CDX indexing plus some weird extra logic to handle our legacy pre-WARC collections.) |
Describe the bug
Pywb is throwing a LiveResourceException when receiving a self-redirect (3xx) from OutbackCDX. This results in Pywb displaying a blank page with the text "Not found".
Steps to reproduce the bug
Example warc file attached in this Slack thread
https://iipc.slack.com/archives/C2NR32PNF/p1691445882952669
Try accessing "http://2020.org.nz/" using the redirect record, should display "Not found" message.
Expected behavior
Pywb to process the self-redirect record from OutbackCDX, and load the record that the self-redirect points to.
Screenshots
Pywb logs
OutbackCDX logs
Environment
The text was updated successfully, but these errors were encountered: