Skip to content

WAIL 1.2.0-Beta2

Pre-release
Pre-release
Compare
Choose a tag to compare
@N0taN3rd N0taN3rd released this 21 Jul 06:39
· 13 commits to wail-electron since this release

Heritrix No Longer The Primary Crawler

WAIL has transitioned away from using Heritrix as the primary preservation method.
The Page Only, Page + Same Domain Links and Page + All Links crawls now use a full Chrome browser.
You can monitor the progress of these crawls by going to the WAIL Crawls section of the application.
For those wondering WAIL does not use WARCreate, Browsertrix, brozzler or Webrecorder or a derivation of one of these tools.

The special sauce powering these crawls has been open sourced and made available through node-warc
npm Package

WAILs browser crawls pass Mat Kellys The Archival Acid Test See #85

Heritrix based crawls are still available tho!!
WAIL provides auto configured Heritrix crawls via Heritrix Depth 1, Heritrix Depth 2 and Heritrix Depth 3 crawl options.
These will launch Heritrix crawls at the respective depth.

beta2 change

  • WAIL runs flash now. The flash plugin used by Google Chrome is packaged for each supported OS 😍
  • Upped after page load wait and scroll time (yes WAIL interacts with the page)
    in order to get more of that juicy only JavaScript content