Skip to content

Latest commit

 

History

History
389 lines (302 loc) · 16.7 KB

README.md

File metadata and controls

389 lines (302 loc) · 16.7 KB

Web Packager Server

Web Packager HTTP Server is an HTTP server built on top of Web Packager. It functions like a reverse-proxy, fetching documents from a backend server, then optimizing and signing them before returning them to requestors. This is similar to AMP Packager, but Web Packager targets all HTML documents except for AMP documents, whereas AMP Packager packages AMP documents. It aims to meet the requirements set by the Google SXG Cache (but see Limitations).

Currently, if you need to package AMP documents into a signed exchange, it is recommended that you use AMP Packager for that purpose and use Web Packager for everything else. This may change in the future where only one packager does both jobs, but for now it means that you have to set up both packagers if you need to process both AMP and non-AMP content.

For general information about Web Packager, see README.md in the repository root.

Prerequisites

  1. Web Packager and its associated HTTP server is written in the Go language thus it requires a Go development environment to run. See Getting Started on golang.org for how to install Go on your computer.

  2. You will also need a certificate and private key pair to use for signing the exchanges. The certificate must:

    • use an ECDSA private key (e.g. prime256v1) generated using:

      $ openssl ecparam -out $PRIV_KEY_FILE -name prime256v1 -genkey
    • have the CanSignHttpExchanges extension.

    • last no longer than 90 days.

    Currently only DigiCert and Google offer these types of certificates. Please follow the instructions on their page regarding what needs to be done to order your certificate. In particular, take note of:

Testing with self-signed / invalid certificates

It is possible to test an otherwise fully production configuration without obtaining a certificate with the CanSignHttpExchanges extension. If you set AllowTestCert to true in the TOML config file (explained later), webpkgserver accepts whichever certificate the user provides (including self-signed certificate) and uses it as if it were a valid signed exchange certificate. You can also use a certificate from Let's Encrypt this way.

You can run Chrome with these command line flags to ignore certificate errors:

--user-data-dir=/tmp/udd
--ignore-certificate-errors-spki-list=$(openssl x509 -pubkey -noout -in path/to/YOUR_CERT_HERE.pem | openssl pkey -pubin -outform der | openssl dgst -sha256 -binary | base64)
'data:text/html,<a href="https://localhost:8080/priv/doc/https://YOUR_TEST_URL_HERE/">click me</a>'

When you start Chrome with this flag, Chrome will show a butter bar that looks like an error (which is actually a warning) that says something about "unsupported command-line flag". That is expected and is working as intended.

Configuration

To build the Web Packager Server assuming installed the source in webpackager directory:

$ cd wpkserver/cmd/webpkgserver
$ go build .

To bring up your instance, create your own copy of the config file named webpkgserver.toml in the current directory (the binary looks for the toml config in the current directory) from webpkgserver.example.toml:

$ cp /path/to/webpkgserver.example.toml ./webpkgserver.toml

Below you'll find the contents of the webpkgserver.example.toml that contain information needed for creating signed exchanges:

[Listen]
    # The port number to listen on. If it is unspecified, webpkgserver will use
    # an arbitrary port number.
    Port = 8080

[SXG.Cert]
    # The path to the PEM file containing the full certificate chain, ordered
    # from the leaf to the root.
    PEMFile = 'path/to/your.pem'

    # The path to the PEM file containing the private key that corresponds to
    # the leaf certificate in PEMFile.
    KeyFile = 'path/to/your.key

    # Use any certificate for signing exchanges. If this parameter is set true,
    # webpkgserver will not verify that the certificate meets the requirements
    # set by the Signed HTTP Exchanges specification, so you can use ordinary
    # TLS certificates or self-signed certificates. Note those certificates only
    # work for testing: the produced signed exchanges will be deemed invalid due
    # to the certificate.
    #
    # If the certificate is missing an OCSP URL, webpkgserver substitutes dummy
    # bytes for the OCSP response.
    AllowTestCert = true

[[Sign]]
    # The domain to limit signed URLs to, case-insensitive. The certificate is
    # supposed to cover this domain.
    Domain = 'example.com'

If you want to use ACME, the following section of the toml file also needs to be filled in. Note that EABKid and EABHmac are what's issued by trust services like Google and DigiCert. When requesting SXG certs from Google CA, use the following Google SXG ACME directory.

[SXG.ACME]
  # Enable webpkgserver to attempt to auto renew certificates using ACME.
  #Enable = false 

  # The path to the Certificate Signing Request PEM file.
  # Required when ACME is enabled.
  CSRFile = "path/to/csr.pem"

  # The ACME discovery URL. It is specified by the Certificate Authority that
  # doles out your certificates. As of April 2022, DigiCert and Google supports
  # automatic renewals of signed exchange certificate:
  # https://cloud.devsite.corp.google.com/public-certificate-authority/docs
  # (Use this production ACME directory: https://dv-sxg.acme-v02.api.pki.goog/directory)
  #
  # https://docs.digicert.com/certificate-tools/acme-user-guide/acme-directory-urls-signed-http-exchange-certificates/
  # Required when ACME is enabled.
  DiscoveryURL = '<Your Discovery URL>'

  # The email address registered to the Certificate Authority for your signed
  # exchange certificates.
  # Required when ACME is enabled.
  Email = '[email protected]'

  # The EABKid and EABHmac need to have synchronized values.  They can both be empty (in which case EAB is not used)
  # or both have valid values. If one is empty and the other is not, the Web Packager will generate an error.
  # This is the Key Identifier from ACME CA. Used for External Account Binding.
  #EABKid = "eab.kid"

  # This is the MAC Key from ACME CA. Used for External Account Binding. Should be in
  # Base64 URL Encoding without padding format.
  #EABHmac = "eab.hmac"

  # For the remaining configuration items, it is important to understand the
  # different challenges employed as part of the ACME protocol:
  # https://ietf-wg-acme.github.io/acme/draft-ietf-acme-acme.html#identifier-validation-challenges
  # https://letsencrypt.org/docs/challenge-types/
  # https://certbot.eff.org/docs/challenges.html/
  #
  # Note that you do not need to set the fields for all of these challenges. It
  # is typically sufficient to have a setting for just one of the challenges. If
  # more than one method is configured, the go-acme/lego library will decide
  # pick one of them for primary use and use other settings as backup. For
  # wildcard certificates, nevertheless, DNSProvider is the only supported
  # method of validation, and others cannot be used. See DNSProvider for more
  # detail.

  # The http server root directory where the ACME http challenge token should
  # be deposited. 
  HTTPWebRootDir = '/path/to/www_root_dir'

  # The port used by the webpkgserver to respond to the HTTP challenge
  # issued as part of ACME protocol. You will need to configure your
  # reverse-proxy server where you route the challenge requests to this port
  # using proxy_pass on NGINX or a similar mechanism on the server of your
  # choice. An example specific to NGINX:
  # https://medium.com/@dipeshwagle/add-https-using-lets-encrypt-to-nginx-configured-as-a-reverse-proxy-on-ubuntu-b4455a729176
  HTTPChallengePort = 5002

  # The port used by webpkgserver to respond to the TLS challenge issued as part
  # of the ACME protocol.
  TLSChallengePort = 5003

  # The DNS Provider to be used for fulfilling the ACME DNS challenge.
  # For the DNS challenge, you need to set certain environment variables
  # which depend on the DNS provider that you use to fulfill the DNS challenge:
  # https://go-acme.github.io/lego/dns/
  # To use DNSProvider, you need to build webpkgserver with
  # `go build -tags dns01`; it is disabled by default because it bloats the
  # binary.
  #
  # Note that you only need the DNS challenge setup if you have wildcard
  # certificates: https://en.wikipedia.org/wiki/Wildcard_certificate
  #DNSProvider = '' 

Then run:

$ webpkgserver

NOTE: If you created webpkgserver.toml elsewhere or with a different name, pass the --config option to webpkgserver so that it can locate your config file. For example:

$ webpkgserver --config /path/to/webpkgserver.toml

To quickly check your instance is running:

$ curl -o out.sxg http://localhost:8080/priv/doc/https://example.com/

Adjust localhost:8080 and https://example.com/ according to the settings in the .toml file. Note that /priv/doc is a special prefix that webpkgserver uses to process incoming signed exchange files. It is not a real directory that has to exist on your local machine.

You can check if the signed exchange is valid by using dump-signedexchange:

$ go get -u github.com/WICG/webpackage/go/signedexchange/cmd/dump-signedexchange
$ dump-signedexchange -i out.sxg -verify

Please check that the content-body is not empty when you are doing your tests.

Running behind Front-end Edge Server

The setup is similar to AMP Packager:

  • If the URL starts with /webpkg/, forward the request unmodified. In NGINX the directive would look like:

    location /webpkg/ {
        proxy_pass http://127.0.0.1:8080;
    }
    
  • Determine if the request is for a signed exchange, based on the Accept header. See the Content Negotiation section for further details on how to do this. If the request is for a signed exchange, rewrite the URL by prepending /priv/doc/ and forward the request. In NGINX the directive would look like:

    proxy_pass http://127.0.0.1:8080/priv/doc/https://example.com$request_uri;
    

    where $request_uri will be the path and not the full URL. For example, this would expand to something like:

    http://127.0.0.1:8080/priv/doc/https://example.com/foo.html
    
  • Do not forward any other requests without adding the /priv/doc prefix (in particular, external requestors should not be able to formulate custom /priv/doc requests).

  • Do not forward any requests that have user-personalized content in them. Consult the spec about the dangers of indiscriminately signing content. If a publisher indiscriminately signs all responses as their origin, they can cause at least two kinds of problems described in the spec: session fixation and misleading content.

  • Every 90 days or sooner, renew your SXG cert and restart webpkgserver.

Content Negotiation

Content negotiation (conneg) setup should be based on the Accept header. Content negotiation is a mechanism defined in the HTTP specification that makes it possible to serve different versions of a document (or more generally, a resource representation) at the same URI, so that user agents can specify which version fits their capabilities the best.

For a given URL, Googlebot will request application/signed-exchange;v=b3 with a q-value equal to 1, while Chromium browsers will request it with q-score less than 1, and other browsers won't specify it at all (but may include */*). Publishers should only serve SXG to crawlers, based on that accept header. Based on the team’s experience, it is difficult to set up an edge server that accurately handles headers with q-scores. We therefore recommend matching on an Accept directive with the following regular expression:

```
Accept: /(^|,)\s*application\/signed-exchange\s*;\s*v=[[:alnum:]_-]+\s*(,|$)/
```

The (,|$) in that regex is important, as it indicates lack of a q parameter, which differentiates Googlebot from Chromium.

Here are details on how different web server setups handle accept headers:

  • Listed below is a sample VirtualHost configuration for Apache:

    <VirtualHost *:443>
    Protocols  http/1.1
    ServerName www.example.com
    ServerAdmin webadmin@localhost
    DocumentRoot /usr/local/apache2/htdocs/
    
    <Directory "/usr/local/apache2/htdocs/sxg_test/">
        RewriteEngine On
        RewriteCond %{HTTP:Accept} (^|,)\s*application/signed-exchange\s*;\s*v=[[:alnum:]_-]+\s*(,|$)
        RewriteRule .+ http://localhost:8080/priv/doc/https://www.example.com%{REQUEST_URI} [P]
    
        Header set X-Content-Type-Options: "nosniff"
    </Directory>
    
    ProxyRequests on
    ProxyPass /webpkg/ http://localhost:8080/webpkg/
    
    SSLCertificateFile /usr/local/apache2/conf/fullchain.pem
    SSLCertificateKeyFile /usr/local/apache2/conf/privkey.pem
    Include /usr/local/apache2/conf/options-ssl-apache.conf
    </VirtualHost>
    
  • NGINX doesn't support conneg natively. Supporting it makes use of regexes and scripts that parse Accept only approximately.

    if ($http_accept ~* "(^|,)\s*application/signed-exchange\s*;\s*v=[[:alnum:]_-]+\s*(,|$)") {
       /* do processing */
    }
    
  • IIS has a CLR API that supports q-values. We haven't researched how to configure that for a reverse proxy setup.

Limitations

In this early phase, we may make backward-breaking changes to the config syntax.

Web Packager aims to automatically meet most but not all Google SXG Cache requirements. In particular, pages that do not use responsive design should specify a supported-media annotation.

Web Packager does not handle request matching correctly. It should not matter unless your web server implements content negotiation using the Variants and Variant-Key headers (not the Vary header). We plan to support the request matching in future, but there is no ETA (estimated time of availability) at this moment.

Note: The above limitation is not expected to be a big deal even if your server serves signed exchanges conditionally using content negotiation: if you already have signed exchanges, you should not need Web Packager.