Skip to content

This is the part of the FidEx project that contains record and replay on web archive, and fidelity checking functionalities levraging the layout tree.

Notifications You must be signed in to change notification settings

USC-NSL/FidEx-fidelity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Doc still under working

FidEx record and replay, and fidelity checking code

Prerequisites

To setup the environment, mainly need to prepare for the following:

  • Python packages:
# Recommend to use virtualenv for running pywb later
pip install -r requirements.txt
  • Node packages:
npm install
  • pywb and webrecorder:
    • pywb:
      • Download and install pywb from here
    • webrecorder:
      • The npm install will install puppeteer, which will download a Chrome test binary. To enable the record of webpages, need to install the webrecorder extension from here

Record and replay

The record and replay code is mainly in the record_replay folder. The code is mainly for running record and replay on the browser with webrecorder and pywb. The code is mainly for the following workflow:

Shared flags:

cd record_replay && node record.js --help
Usage: record [options] <url>

Options:
  -d --dir <directory>             Directory to save page info (default: "pageinfo/test")
  --download <downloadPath>        Directory to save downloads (default: "downloads")
  -f --file <filename>             Filename prefix (default: "dimension")
  -a --archive <Archive>           Archive list to record the page (default: "test")
  -m, --manual                     Manual control for finishing loading the page
  -i, --interaction                Interact with the page
  -w, --write                      Collect writes to the DOM
  -s, --screenshot                 Collect screenshot and other measurements
  --remove                         Remove recordings after finishing loading the page
  --scroll                         Scroll to the bottom.
  -c, --chrome_data <chrome_data>  Directory of Chrome data
  --headless                       If run in headless mode
  -p --proxy <proxy>               Proxy server to use. Note that is chrome is installed with extensions that controls
                                   proxy, this could not work.
  -e --exetrace                    Enable execution trace for both js run and network fetches
  -h, --help                       display help for command
  • Record the page with webrecorder (record.js)

    node record.js [flags] <url>

    The record.js will automatically download the warc file after the page is loaded.

  • Upload the warc file to pywb

    wb-manager add <col> <recorded_warc_file>
  • Replay the page with pywb (replay.js)

    node replay.js [flags] <url>
  • E2E (autorecord.py)

    • To run e2e (record -> add to pywb -> replay), autorecord.py can be used.
    • autorecord.record_replay can be used to run the record and replay in one go.
  • Interaction

    • The interaction related code is in chrome_ctx/interaction.js. Both record.js and replay.js can be run with the -i flag to trigger the interaction.

Fidelity check

Once the record and replay is done, the layout tree, screenshot, dom and other optional tracked information will be store in a directory (specified with -d flag).

To run the fidelity check, run corresponding functions in fidelity_check folder, specified with the directory path, and the corresponding prefix (default will be "live" and "archive")

About

This is the part of the FidEx project that contains record and replay on web archive, and fidelity checking functionalities levraging the layout tree.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published