Skip to content

Fetching pages with WebDriver

dgtlmoon edited this page Mar 1, 2024 · 37 revisions

Many modern web pages use JavaScript to fill-in the content, they are more dynamic and sometimes need a real chrome browser for fetching the content, although many may work with our built in 'fetcher'

The backend can be configured to fetch pages via Chrome (ChromeDriver) using the built in WebDriver network interface, this is mainly used where the pages you are watching are using JavaScript to render the page content (the basic fetcher does not execute any JS!), The easiest way is to enable it is to uncomment the following in your local docker-compose.yml and restart your docker-compose.

Note: RaspberryPi needs a different selenium/webdriver runner, please edit your docker-compose.yml and use the recommended RaspberryPi image more information here - Use seleniarm/standalone-chromium:4.0.0-20211213 instead of selenium/standalone-chrome-debug:3.141.59

    browser-chrome:
        hostname: browser-chrome
        image: selenium/standalone-chrome-debug:3.141.59
        volumes:
            # Workaround to avoid the browser crashing inside a docker container
            # See https://github.com/SeleniumHQ/docker-selenium#quick-start
            - /dev/shm:/dev/shm
        restart: unless-stopped

If using docker (instead of docker-compose) the following will get ChangeDetection.io and the chromium WebDriver up and running:

docker run -d \
  --name selenium \
  --restart unless-stopped \
  -p 4444:4444 \
  --shm-size="2g" \
  selenium/standalone-chrome-debug:3.141.59

docker run -d \
  --name changedetectionio \
  --restart unless-stopped \
  --link selenium \
  -p 5000:5000 \
  -e WEBDRIVER_URL="http://selenium:4444/wd/hub" \
  -v datastore-volume:/datastore \
  dgtlmoon/changedetection.io

Then visit /settings and [Fetching] tab and enable the WebDriver/Chrome option

image

The URL for the WebDriver interface is set with the WEBDRIVER_URL environment variable (http://browser-chrome:4444/wd/hub by default)

Raspberry Pi notes

Known to work with RaspberryPi-4 Use the seleniarm/standalone-chromium:4.0.0-20211213 as image:. Note that this currently only supports the 64-bit version of Raspbian OS.

  • It's good to set the ENV var FETCH_WORKERS to a low value like 2 or 3, because maybe having 10 sessions of chrome open might be a bit too heavy on your rPi

Microsoft Windows - Running ChromeDriver natively (without Docker)

You need to install WebDriver/ChromeDriver for Chrome, this will "listen" for instructions from your changedetection.py and drive the browser to fetch the results.

This recipe is best suited to when you install the software using the python/pip installer https://github.com/dgtlmoon/changedetection.io/wiki/Microsoft-Windows#method-1-with-python-pip-install - please someone add instructions for docker-compose based installs :)

Tested with Chrome version 99.0

  1. Install Chrome Web Browser https://www.google.com/chrome/
  2. Install the correct Chromium WebDriver https://chromedriver.chromium.org/downloads for the version of Chrome you installed.
  3. Unpack and run the WebDriver, this will talk/drive Chrome to get the results image
  4. Set system wide or cmd level environmental variable for Changedetection.io to know where to find the ChromeDriver, from the commandline set WEBDRIVER_URL=http://localhost:9515
  5. Run the app from the same window using the changedetection.py image
  6. Don't forget to enable chrome fetching for those sites that require Javascript, or any site you prefer to fetch via your Chrome, the user interface should show "http://localhost:9515" if you set the environmental variable correctly. image

You're not meant to access this with your browser, this is only for your changedetection install to communicate to chromedriver with