Skip to content

OpenWayback Replay API

Patrick T. Rourke edited this page Feb 3, 2016 · 9 revisions

Introduction

The OpenWayback URL for a single archived web page for a specific date and time looks like this:

http://webarchive.archivedomain.tld/all/20000101000000/subjectdomain.tld
http://[wayback server hostname]/[access point]/[yyyymmddhhmmss]/[access_url]
  • Access points are indicated by strings in the first field of the OpenWayback URL after the hostname; this access point name is configured in the OpenWayback configuration file.

  • Dates are represented in the second field of the OpenWayback URL after the hostname as fourteen-character integers in the format yyyymmddhhmmss; on requests, they may be truncated.

  • The access URL - the URL of the archived site - is represented in the third and last field of the OpenWayback URL after the hostname. Because an access URL may itself include a path, the fields of the OpenWayback URL should always be counted from the left; everything after the fifth slash is part of the access URL.

Direct Page Requests

The simplest requests are for a specific access URL for a specific date.

  • If there is an archive of the requested access URL for the requested date, that archive is returned to the browser by OpenWayback with an HTTP 200 response.

    • http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
    • Archived page displayed
    • URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
    • HTTP response: 200
  • If there is no archive of the requested access URL for the requested date, the archive whose date is closest to the requested date (whether earlier or later) is returned to the browser by OpenWayback with an HTTP 302 response:

    • http://webarchive.archivedomain.tld/all/200101081200/subjectdomain.tld
      • Archived page displayed: returns the page whose date most closely matches 2001-01-08 12:00
      • URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
      • HTTP response: 302
  • If there is no archive of the requested access URL for any date, OpenWayback will return an HTTP 404 response with a page indicated that the site is not found in the archive:

    • http://webarchive.archivedomain.tld/all/200101081200/nonexistentdomain.tld
      • Error page displayed
      • URL in location bar: http://webarchive.archivedomain.tld/all/200101081200/nonexistentdomain.tld
      • HTTP response: 404

Fuzzy Date Requests for Specific Access URLs

  • If the date part of the url is truncated, the date closest to the middle of the range implied by the request is matched and the request is redirected to the matched page (while OpenWayback returns an HTTP 302 response).

    • http://webarchive.archivedomain.tld/all/2000/subjectdomain.tld

      • Archived page displayed: Returns the capture of the page whose archival date most closely matches 2000-07-01
      • URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
      • HTTP response: 302
    • http://webarchive.archivedomain.tld/all/200010/subjectdomain.tld

      • Archived page displayed: Returns the capture of the page whose archival date most closely matches 2000-10-15
      • URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
      • HTTP response: 302
    • http://webarchive.archivedomain.tld/all/subjectdomain.tld

      • Archived page displayed: Returns the most recent capture of the page
      • URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
      • HTTP response: 302

First and last capture requests

There are special requests that will return either the first or the last capture from the archive.

  • A single integer 1 in the date field of the OpenWayback URL will return the first capture of the requested page.

    • http://webarchive.archivedomain.tld/all/1/subjectdomain.tld
      • Archived page displayed: Returns the first capture of the requested page
      • URL in location bar: http://webarchive.archivedomain.tld/all/200101092140/subjectdomain.tld
      • HTTP response: 302
  • A single integer 2 in the date field of the OpenWayback URL will return the most recent capture of the requested page.

    • http://webarchive.archivedomain.tld/all/2/subjectdomain.tld
      • Archived page displayed: Returns the most most recent capture of the page
      • URL in location bar: http://webarchive.archivedomain.tld/all/200911290930/subjectdomain.tld
      • HTTP response: 302
  • If there is no archive of the page for any date, the response to a request with a truncated date will be the same as that for a specific date: an error page and an HTTP 404 response.

Capture Date List Requests for Date Ranges

  • Requesting a specific access URL with an asterisk as the sole character in the date part of the OpenWayback URL will return a page showing the capture dates for the requested URL (different configurations of OpenWayback will return calendar pages for a single year - the year of the latest capture - or for multiple years, or return a table of capture dates). The page returned is often called a "calendar page," even if the format is a table of captures rather than a calendar:

    • http://webarchive.archivedomain.tld/all/*/subjectdomain.tld
      • Capture date page displayed: Returns list of capture dates
      • URL in location bar: http://webarchive.archivedomain.tld/all/*/subjectdomain.tld
      • HTTP response: 200
  • Adding the asterisk wildcard character after a year will return a list of capture dates for that year:

    • http://webarchive.archivedomain.tld/all/2000*/subjectdomain.tld
      • Capture date page displayed: Returns list of capture dates in the year 2000
      • URL in location bar: http://webarchive.archivedomain.tld/all/2000*/subjectdomain.tld
      • HTTP response: 200
  • An ordered pair of dates (whether all 14 characters or truncated), separated by a hyphen, and concluded by an asterisk represents a date range; a request with a date range will return a list of capture dates for that range:

    • http://webarchive.archivedomain.tld/all/2000-2012*/subjectdomain.tld
      • Capture date page displayed: Returns list of capture dates in the years 2000 to 2012
        • URL in location bar: http://webarchive.archivedomain.tld/all/2000-2012*/subjectdomain.tld
        • HTTP response: 200

Captured Page List Requests

If a wildcard is added to the first part of the access URL, all captured URLs whose original URL begins with the string in the access URL field will be listed, with the number of capture dates for each URL, the total count of captured pages, and the date range of captures.

  • http://webarchive.archivedomain.tld/all/*/subjectdomain.tld*

    • List page displayed: Returns a list of all captures of pages with the prefix subjectdomain.tld for all dates.
    Showing 1 to 6,609 of 6,609 results for subjectdomain.tld
    subjectdomain.tld/ 475 versions 
    2,961 pages between Jun 16, 1997 and Jun 5, 2013 
    
    subjectdomain.tld/%22 3 versions 
    7 pages between Mar 23, 2003 and Jan 22, 2009 
    
    subjectdomain.tld/2009/12/07/today_in_history 1 version 
    3 pages between Aug 27, 2010 and Nov 27, 2010 
    
    • URL in location bar: http://webarchive.archivedomain.tld/all/*/subjectdomain.tld*
    • HTTP response: 200
  • As with capture list responses, page list responses can be limited by date ranges:

    • http://webarchive.loc.gov/all/2008*/subjectdomain.tld*

      • List page displayed: Returns a list of all captures of pages with the prefix subjectdomain.tld for all dates.
      Showing 1 to 2,012 of 2,012 results for subjectdomain.tld
      subjectdomain.tld/ 80 versions 
      500 pages between Jan 1, 2008 and Dec 31, 2008
      
      • URL in location bar: http://webarchive.loc.gov/all/2008*/subjectdomain.tld*
      • HTTP response: 200
  • A wildcarded access URL request with a specific date will fail and return an error page with an HTTP response code of 400:

    • http://webarchive.loc.gov/all/200804051200/subjectdomain.tld*
      • Error page displayed: The request is missing information, or is not understood by this server. Bad URL(subjectdomain.tld*)
      • URL in location bar: http://webarchive.loc.gov/all/200804051200/subjectdomain.tld*
      • HTTP response: 400