Skip to content

huydx/lfruit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=======

Lfruit

img

Lets say: you want a crawler which you can say to it

I have a web site url.com, let's go crawl and download for me every pdf file url.com, and sites that link to url.com have

Lfruit is that tool for you. You can specify whatever file format you want

img

When lfruit is useful

  • You want to download every pdf lectures from university professor's website
  • You want to download every gif files of an url
  • You want to download every mp3 file of a music website

Installation & Usage

Just

gem install 'lfruit'

And then execute:

  $ lfruit --url=yoururl.com
           --pattern=(your include regex)
           --exclude_pattern=(your pattern that you dont want to download)
           --location=(folder you want to download)
           --parallel_num=(how much threaded you want to use)
           --limit=(how many times your crawler will run)

Contributing

  1. Fork it ( https://github.com/[my-github-username]/lfruit/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

About

crawler by file extension

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages