Releases · rivermont/spidy

04 Oct 03:20

rivermont

1.4.0

a277538

spidy Web Crawler Release 1.4 Latest

Latest

Much update!

Confirmed and added support for OS/X and Linux thanks to michellemorales and j-setiawan.
Updated documentation to the current state of things. Still work to be done there.
Removed 'bad file' functionality as it wasn't working as intended and wasn't important anyway. That's what error logs are for.
Resolving <base> tags to grab links that wouldn't have been recognized before. Thanks lxml!
Added an optional (on by default) check for file size. Won't download any files larger than 500 MB, assuming the site returns a Content-Length header.
Added Firefox (on Ubuntu) as an option for browser spoofing.

spidy.zip contains just crawler.py and config/, while the source code archives contain all files.

Assets 3

14 Sep 17:56

rivermont

1.3

a6cf381

spidy Web Crawler Release 1.3

Final 1.3.0 release. Added error handling back in - no changes needed.

Optimized all file creation and loading. Everything is now saved with UTF-8 encoding, allowing for foreign characters and EMOJI in pages.

Assets 3

14 Sep 16:02

rivermont

1.3-alpha

6fb72be

spidy Web Crawler Release 1.3-alpha

Optimized all file creation and loading. Everything is now saved with UTF-8 encoding, allowing for foreign characters and EMOJI in pages.

In Alpha as the error-handling system is being slightly redesigned. Still functional however!

Assets 3

07 Sep 23:08

rivermont

1.2

e8c9405

spidy Web Crawler Release 1.2

Added domain restrictions. Crawling can now be limited to a certain domain, such as wsj.com, https://www.wsj.com, or https://www.wsj.com/article. Can be set when entering configuration settings or in the config files.
Also more bugfixes and MIME types because those are cool.

Assets 3

24 Aug 21:04

rivermont

1.0

50e9f72

spidy Web Crawler Release 1.0

The first official release of spidy!
A GUI is in the works, as well as many more awesome features.

spidy.zip contains only the files necessary to run the crawler, while the source code downloads contain all the things.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: rivermont/spidy

spidy Web Crawler Release 1.4

spidy Web Crawler Release 1.3

spidy Web Crawler Release 1.3-alpha

spidy Web Crawler Release 1.2

spidy Web Crawler Release 1.0