You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The read me file, as far as I can see, focuses on the command line. I have a dozen WACZ files created to archive Facebook posts of political leaders during COVID, and I'm trying to understand how people could use them in the future beyond uploading them into the browser plugin (which doesn't seem to handle big files very well). Can we use py-wacz to interact with and explore the contents of a WACZ file? Is there any documentation on using archives?
The text was updated successfully, but these errors were encountered:
pywacz doesn't offer any tools for exploring archived content. ReplayWebpage is the primary tool that we make for end users exploring web archives.
You may be interested in warcio (ours) or Internet Archive's warc Python library (older), WACZ files are basically extra data wrapped around WARC files in a ZIP. If you want a more data-driven approach you might start there. Additionally, the pages.jsonl file inside the pages directory of the WACZ contains extracted text metadata you may find useful.
Thanks. I'll see what I can do with Warcio. For what is worth, we tried uploading the WACZ files into an Archive-It repository so we could use Archives Unleashed tools, but never managed to get it working. Webrecorder is an essential tool to try and capture content on sites that blocks the Internet Archive, but we need to start developing documentation on what to do with the data once it is created. I'm working on a paper that will try to make a start and if there is anyone from your project interested in collaborating, I'd be happy for the help.
The read me file, as far as I can see, focuses on the command line. I have a dozen WACZ files created to archive Facebook posts of political leaders during COVID, and I'm trying to understand how people could use them in the future beyond uploading them into the browser plugin (which doesn't seem to handle big files very well). Can we use py-wacz to interact with and explore the contents of a WACZ file? Is there any documentation on using archives?
The text was updated successfully, but these errors were encountered: