Much update!
- Confirmed and added support for OS/X and Linux thanks to michellemorales and j-setiawan.
- Updated documentation to the current state of things. Still work to be done there.
- Removed 'bad file' functionality as it wasn't working as intended and wasn't important anyway. That's what error logs are for.
- Resolving
<base>
tags to grab links that wouldn't have been recognized before. Thanks lxml! - Added an optional (on by default) check for file size. Won't download any files larger than 500 MB, assuming the site returns a
Content-Length
header. - Added Firefox (on Ubuntu) as an option for browser spoofing.
spidy.zip
contains just crawler.py
and config/
, while the source code archives contain all files.