Release spidy Web Crawler Release 1.4 · rivermont/spidy

Much update!

Confirmed and added support for OS/X and Linux thanks to michellemorales and j-setiawan.
Updated documentation to the current state of things. Still work to be done there.
Removed 'bad file' functionality as it wasn't working as intended and wasn't important anyway. That's what error logs are for.
Resolving <base> tags to grab links that wouldn't have been recognized before. Thanks lxml!
Added an optional (on by default) check for file size. Won't download any files larger than 500 MB, assuming the site returns a Content-Length header.
Added Firefox (on Ubuntu) as an option for browser spoofing.

spidy.zip contains just crawler.py and config/, while the source code archives contain all files.

Provide feedback