Skip to content
This repository has been archived by the owner on Sep 17, 2020. It is now read-only.

WARC Indexing / Replay Problems #87

Open
ghost opened this issue Nov 8, 2019 · 1 comment
Open

WARC Indexing / Replay Problems #87

ghost opened this issue Nov 8, 2019 · 1 comment

Comments

@ghost
Copy link

ghost commented Nov 8, 2019

Hi,
Loving Webrecorder tools - great work! Thank you!
Using the latest player for Windows, I tried uploading a large-ish WARC file (3.2GB) created using a local install of Webrecorder running on Ubuntu.
We've experienced a few problems and are not sure whether it's to do with Webrecorder or the player, or both.
On some occasions the WARC stops indexing and hangs but the main problem is that, in the cases when it gets to 100% and opens, there is missing content (missing because it is there in Webrecorder in replay mode). We've also looked at the cache file generated and notice missing domains and URLs that we know form part of the exported collection.
Any ideas?
I can share the WARC file in question, if you need it.
Thanks for your help,
Tom

ikreymer added a commit to Rhizome-Conifer/conifer that referenced this issue Nov 9, 2019
- limit pages and bookmarks to 10000
- add settings to limit bookmarks and pages separately
- include page and bookmark creation in progress bar
before, page/bookmark creation was taking a long time but not included in progress update
should fix #768, likely webrecorder/webrecorder-player#87, webrecorder/webrecorder-player#78, webrecorder/webrecorder-player#86
ikreymer added a commit to Rhizome-Conifer/conifer that referenced this issue Nov 9, 2019
* upload improvements:
- limit pages and bookmarks to 10000
- add settings to limit bookmarks and pages separately
- include page and bookmark creation in progress bar, last 80-90% for page indexing, and 90-100% for bookmark creation.
- optimize: use zscan_iter() for iterating over pages, add polyfill for fakeredis to still use zrange
- fix tests
- bump version to 4.8.4

previously, page/bookmark creation was taking a long time but not included in progress update
should fix #768, likely webrecorder/webrecorder-player#87, webrecorder/webrecorder-player#78, webrecorder/webrecorder-player#86
@ikreymer
Copy link
Member

Thanks for sharing the WARC, please try the 1.8.0 release. We've made some improvements to large WARC indexing and should work much better.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant