Python in the read me file #41

jburnford · 2024-02-01T15:39:16Z

The read me file, as far as I can see, focuses on the command line. I have a dozen WACZ files created to archive Facebook posts of political leaders during COVID, and I'm trying to understand how people could use them in the future beyond uploading them into the browser plugin (which doesn't seem to handle big files very well). Can we use py-wacz to interact with and explore the contents of a WACZ file? Is there any documentation on using archives?

Shrinks99 · 2024-02-29T20:28:32Z

pywacz doesn't offer any tools for exploring archived content. ReplayWebpage is the primary tool that we make for end users exploring web archives.

You may be interested in warcio (ours) or Internet Archive's warc Python library (older), WACZ files are basically extra data wrapped around WARC files in a ZIP. If you want a more data-driven approach you might start there. Additionally, the pages.jsonl file inside the pages directory of the WACZ contains extracted text metadata you may find useful.

Hopefully this answers your question? :)

jburnford · 2024-02-29T22:05:21Z

Thanks. I'll see what I can do with Warcio. For what is worth, we tried uploading the WACZ files into an Archive-It repository so we could use Archives Unleashed tools, but never managed to get it working. Webrecorder is an essential tool to try and capture content on sites that blocks the Internet Archive, but we need to start developing documentation on what to do with the data once it is created. I'm working on a paper that will try to make a start and if there is anyone from your project interested in collaborating, I'd be happy for the help.

Shrinks99 added the question Further information is requested label Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python in the read me file #41

Python in the read me file #41

jburnford commented Feb 1, 2024

Shrinks99 commented Feb 29, 2024 •

edited

Loading

jburnford commented Feb 29, 2024

Python in the read me file #41

Python in the read me file #41

Comments

jburnford commented Feb 1, 2024

Shrinks99 commented Feb 29, 2024 • edited Loading

jburnford commented Feb 29, 2024

Shrinks99 commented Feb 29, 2024 •

edited

Loading