WAIL 1.2.0-Beta2
Pre-releaseHeritrix No Longer The Primary Crawler
WAIL has transitioned away from using Heritrix as the primary preservation method.
The Page Only
, Page + Same Domain Links
and Page + All Links
crawls now use a full Chrome browser.
You can monitor the progress of these crawls by going to the WAIL Crawls
section of the application.
For those wondering WAIL does not use WARCreate, Browsertrix, brozzler or Webrecorder or a derivation of one of these tools.
The special sauce powering these crawls has been open sourced and made available through node-warc
WAILs browser crawls pass Mat Kellys The Archival Acid Test
See #85
Heritrix based crawls are still available tho!!
WAIL provides auto configured Heritrix crawls via Heritrix Depth 1
, Heritrix Depth 2
and Heritrix Depth 3
crawl options.
These will launch Heritrix crawls at the respective depth.
beta2 change
- WAIL runs flash now. The flash plugin used by Google Chrome is packaged for each supported OS 😍
- Upped after page load wait and scroll time (yes WAIL interacts with the page)
in order to get more of that juicy only JavaScript content