Wpull

Wpull is a Wget-compatible (or remake/clone/replacement/alternative) web downloader and crawler.

Notable Features:

Written in Python: lightweight, modifiable, robust, & scriptable
Graceful stopping; on-disk database resume
PhantomJS & youtube-dl integration (experimental)

Install

Wpull uses Python 3.

Once Python is installed, download Wpull from PyPI using pip:

pip3 install wpull

For detailed installation instructions and potential caveats, please see https://wpull.readthedocs.io/en/master/install.html.

Example Commands

To download the About page of Google.com:

wpull google.com/about

To archive a website:

wpull billy.blogsite.example \
    --warc-file blogsite-billy \
    --no-check-certificate \
    --no-robots --user-agent "InconspiuousWebBrowser/1.0" \
    --wait 0.5 --random-wait --waitretry 600 \
    --page-requisites --recursive --level inf \
    --span-hosts-allow linked-pages,page-requisites \
    --escaped-fragment --strip-session-id \
    --sitemaps \
    --reject-regex "/login\.php" \
    --tries 3 --retry-connrefused --retry-dns-error \
    --timeout 60 --session-timeout 21600 \
    --delete-after --database blogsite-billy.db \
    --quiet --output-file blogsite-billy.log

To see all options:

wpull --help

Documentation

Documentation is located at https://wpull.readthedocs.io/. Please have a look at it before using Wpull's advanced features.

Help

Need help? Please see our Help page which contains frequently asked questions and support information.

The issue tracker is located at https://github.com/chfoo/wpull/issues.

Dev

Contributions and feedback are greatly appreciated.

Credits

This project contains third-party source code licensed under different terms:

wpull.backport.logging
wpull.thirdparty.robotexclusionrulesparser
wpull.thirdparty.dammit

We would like to acknowledge the authors of GNU Wget as Wpull uses algorithms from Wget.

Name		Name	Last commit message	Last commit date
Latest commit History 2,024 Commits
doc		doc
freezer		freezer
icon		icon
test		test
wpull		wpull
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.rst		README.rst
autopep8.sh		autopep8.sh
requirements-sphinx.txt		requirements-sphinx.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wpull

Install

Example Commands

Documentation

Help

Dev

Credits

About

Releases

Packages

Languages

License

TarekJor/wpull

Folders and files

Latest commit

History

Repository files navigation

Wpull

Install

Example Commands

Documentation

Help

Dev

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages