Skip to content

A plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Notifications You must be signed in to change notification settings

q-m/scrapy-webarchive

Repository files navigation

Scrapy Webarchive

Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Features

  • Save web crawls in WACZ format (multiple storages supported; local and cloud).
  • Crawl against WACZ format archives.
  • Integrate seamlessly with Scrapy’s spider request and response cycle.

Compatibility

  • Python 3.8, 3.9, 3.10, 3.11 and 3.12

Documentation

Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/

About

A plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages