-
Notifications
You must be signed in to change notification settings - Fork 9
Heritrix
Detailed information about a currently running or previous crawls is shown here.
Each crawl's seed URL, crawl status, timestamp of update, number of URLs in the seed(s) web page have been processed or awaiting processing and actions that can be performed.
The actions provided per crawl are shown in the image below.
The crawl's configuration is displayable via the view config option. Starting, restarting, and termination of a crawl is done through this menu. Crawls may also be deleted through this menu as well. Deleting the crawl through this option permanently deletes it from the file system but does not affect the WARCs produced by the crawl.
A crawl's job directory can be re-scanned by clicking the Rescan Job Directory
button, located on the bottom toolbar.
The web UI provided by Heritrix can be viewed in the default browser by clicking the Launch Web UI
button, also located on the bottom toolbar.
WAIL is a project of the Web Science / Digital Libraries Research Group at Old Dominion University.
If you wish to find out more about the group you can
This work is supported by the National Endowment for the Humanities (NEH), through Digital Humanities grants HD-51670-13 and HK-50181-14.
Intro
Getting Started
Using WAIL
- Navigating WAIL
- Collections
- Adding To A Collection
- Heritrix
- Services
- Miscellaneous
- Twitter Archiving
- Twitter Authorization
- FAQ
Development