Skip to content

Advanced Crawling

devloop edited this page Oct 22, 2024 · 1 revision

Advanced Crawling with Wapiti

Here are a few tips and tricks to control Wapiti's crawler like a pro.

Attack a Specific URL or Form

As shown in the Basic Usage section, you can leverage the --scope parameter to restrict the crawler to a specific section of a website.

If you just want to attack a single URL, the url scope is what you want:

wapiti -u "http://target.tld/my/potential/vulnerable/script.php?id=3" --scope url

Now let's say the resource you want to test is not a URL but a form receiving data through POST. There is a specific option for that: --data.

wapiti -u "http://target.tld/my/potential/vulnerable/form.php" --scope page --data "user=1&password=1"

Note: The --data option only applies to POST requests and is used only for the reference URL (given with -u or --url). It can only accept URL-encoded data.

Crawling JavaScript-based Websites

Modern websites heavily relying on JavaScript frameworks (e.g., AngularJS, jQuery) can be challenging to crawl with traditional HTML parsing techniques.

Fortunately, Wapiti offers both a semi-automated and fully automated way to crawl such websites.

Semi-automated

In this mode, you can run Wapiti with the --mitm-port option. This option requires a port number on which Wapiti will launch an intercepting proxy using mitmproxy.

wapiti -u "http://target.tld" --mitm-port 8080

Once Wapiti is launched, you need to manually configure your browser to use the HTTP proxy on the specified port.

Everything you crawl with your browser (including classic HTTP requests or XHR requests) will be intercepted and attacked by Wapiti.

To use this mode with websites that have an SSL/TLS certificate, you first need to extract the local mitmproxy certificate authority. You can refer to the mitmproxy documentation for instructions on how to do this. After that, you'll need to load the certificate into your browser.

Fully Automated (Headless)

With the --headless option, you can activate a headless crawler that works automatically without manual interaction.

Supported values for this option are:

  • no: The default value, uses the traditional crawler.
  • hidden: The headless crawler is used, but you won’t see it working.
  • visible: The headless crawler is used, and you will see it browsing the webpages.

This feature relies on Firefox. For it to work, you must have both Firefox and geckodriver installed.

Example:

wapiti -u "http://target.tld" --headless visible

Tuning the Headless Crawler

The headless mode also relies on the mitmproxy component. Wapiti sets up the intercepting proxy, then launches a headless Firefox configured to use it.

While you don’t need to worry about the port used by Wapiti for the proxy, you can specify it using the --mitm-port option alongside --headless.

wapiti -u "http://target.tld" --headless hidden --mitm-port 8080

Using a headless browser can slow down crawling. To ensure every XHR request has time to execute properly, Wapiti waits a few seconds between crawling each webpage.

You can adjust this delay with the --wait option:

wapiti -u "http://target.tld" --headless hidden --wait 5

Tip: Increase the wait time if you're crawling complex web pages with many XHR requests, but note that this will slow down the overall scan.