Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

PWA custom metric #198

Merged
merged 6 commits into from
May 13, 2021
Merged

PWA custom metric #198

merged 6 commits into from
May 13, 2021

Conversation

rviscomi
Copy link
Member

@rviscomi rviscomi commented May 11, 2021

This is a custom metric that we can use as a workaround for the HTTP Archive response bodies being unavailable. It may also be useful for the PWA chapter of the 2021 Web Almanac.

The process to get this data in and out of BigQuery is:

  • merge this PR
  • sync the WPT test server with this repo
  • wait for the crawl to be completed
  • query the pages dataset for the _pwa field in the HAR payload

Known issue: a page like https://developers.google.com/web/tools/chrome-user-experience-report/ installs a SW but its JS is so minified that there's no way to reliably parse the registration with a regex. I'd love to hear if there are better ways to detect which JS resource is a SW. (maybe we can borrow a technique from Lighthouse? cc @brendankenny)

Example WPT: https://www.webpagetest.org/result/210511_BiDc1V_ae6dbe0c19cd64c52b477fa46d3ab42a/?f=json

Output (addressable as $.data.runs[1].firstView.pwa):

{
  "serviceWorkers": {
    "https://www.publicstorage.com/assets2019/sw.js": "importScripts(\"/static/precache-manifest.d35d3ddf878bb3d67c18641684395c87.js\", \"https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-sw.js\");\n\nif (workbox) {\r\n  workbox.skipWaiting();\r\n  workbox.clientsClaim();\r\n  \r\n  workbox.setConfig({ debug: true });\r\n  workbox.core.setLogLevel(workbox.core.LOG_LEVELS.debug);\r\n \r\n  workbox.routing.registerRoute('/help-center', workbox.strategies.networkFirst());\r\n\r\n  // Precache all files related to our webpack entries.\r\n  workbox.precaching.precacheAndRoute(self.__precacheManifest || [], { cleanUrls: false });\r\n  \r\n  //workbox.precaching.precacheAndRoute(['/'], { cleanUrls: false }); \r\n   \r\n  workbox.routing.registerRoute(/.*\\.(?:ashx|png|jpg|jpeg|svg|gif)/g, workbox.strategies.cacheFirst({ \r\n      cacheName: 'image-cache',\r\n  }));\r\n  \r\n  workbox.routing.registerRoute(new RegExp('^https://fonts.(?:googleapis|gstatic).com/(.*)'), workbox.strategies.cacheFirst());  \r\n  workbox.routing.registerRoute(new RegExp('/Fonts/.*\\.ttf'), workbox.strategies.cacheFirst());  \r\n}\n"
  },
  "manifests": {
    "https://www.publicstorage.com/static/manifest.json": {
      "icons": [
        {
          "src": "/static/icon_512x512.png",
          "sizes": "512x512",
          "type": "image/png"
        },
        {
          "src": "/static/icon_192x192.png",
          "sizes": "192x192",
          "type": "image/png"
        }
      ],
      "name": "Public Storage",
      "short_name": "ps.com",
      "orientation": "portrait",
      "display": "standalone",
      "start_url": "/?src=pwa",
      "description": "PublicStorage.com",
      "background_color": "#ffffff",
      "theme_color": "#ff6200"
    }
  },
  "serviceWorkerInitiated": [
    "https://www.publicstorage.com/assets2019/css/client.theme.9e6815b6a481b58a2af0.css",
    "https://www.publicstorage.com/Fonts/ITCAvantGardePro-Md.ttf",
    "https://www.publicstorage.com/Fonts/ps-icons-v2.woff",
    "https://www.publicstorage.com/assets2019/sw.js",
    "https://www.publicstorage.com/static/precache-manifest.d35d3ddf878bb3d67c18641684395c87.js",
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-sw.js",
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-core.dev.js",
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-routing.dev.js",
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-strategies.dev.js",
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-precaching.dev.js",
    "https://www.publicstorage.com/assets2019/js/client.resaccess.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/js/client.usloc.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.rescreate.baedf7260b9b00c6147b.css",
    "https://www.publicstorage.com/assets2019/css/client.changepass.9b9bf8b059a126059b93.css",
    "https://www.publicstorage.com/assets2019/css/client.city.59e0f17ff453d5785b55.css",
    "https://www.publicstorage.com/assets2019/js/client.city.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.content.7c79871b0ee87dee8256.css",
    "https://www.publicstorage.com/assets2019/js/client.content.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.hc.1f191347da9e520967da.css",
    "https://www.publicstorage.com/assets2019/js/client.hc.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.home.af63b9dbacf3a86ab69a.css",
    "https://www.publicstorage.com/assets2019/js/client.home.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.properties.6ee67c15ce26f4cdc2f1.css",
    "https://www.publicstorage.com/assets2019/js/client.properties.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.registration.867dbc00c685f24a2eb5.css",
    "https://www.publicstorage.com/assets2019/js/client.registration.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.resaccess.ffbec641b4682d6ed85c.css",
    "https://www.publicstorage.com/assets2019/js/client.blog.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/js/client.changepass.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/js/client.rescreate.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.resdashboard.5f3aac2fbb509bc3a7a5.css",
    "https://www.publicstorage.com/assets2019/js/client.resdashboard.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.reseci.9d0e32303dbcaae49d2a.css",
    "https://www.publicstorage.com/assets2019/js/client.reseci.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.resretrieve.f3c2dddb035a855c4ee5.css",
    "https://www.publicstorage.com/assets2019/js/client.resretrieve.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.resstatus.ac1d59029b0b406ebac5.css",
    "https://www.publicstorage.com/assets2019/js/client.resstatus.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.search.67820d0192868975ef5f.css",
    "https://www.publicstorage.com/assets2019/js/client.search.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/js/client.theme.0df8157e10d679ddcb30.min.js",
    "https://www.publicstorage.com/assets2019/css/client.usloc.017dd1cdde68f266ca96.css",
    "https://www.publicstorage.com/assets2019/css/client.blog.1953d854b7543b90087d.css"
  ],
  "workboxInfo": {
    "https://www.publicstorage.com/assets2019/sw.js": [
      "workbox.skipWaiting",
      "workbox.clientsClaim",
      "workbox.setConfig",
      "workbox.core.setLogLevel",
      "workbox.core.LOG",
      "workbox.routing.registerRoute",
      "workbox.strategies.networkFirst",
      "workbox.precaching.precacheAndRoute",
      "workbox.precaching.precacheAndRoute",
      "workbox.routing.registerRoute",
      "workbox.strategies.cacheFirst",
      "workbox.routing.registerRoute",
      "workbox.strategies.cacheFirst",
      "workbox.routing.registerRoute",
      "workbox.strategies.cacheFirst"
    ],
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-sw.js": [
      "workbox.v"
    ],
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-core.dev.js": [
      "workbox.core",
      "workbox.v",
      "workbox.core.LOG",
      "workbox.core",
      "workbox.core",
      "workbox.core",
      "workbox.v",
      "workbox.v",
      "workbox.setConfig",
      "workbox.core.cacheNames",
      "workbox.core.setCacheNameDetails",
      "workbox.core.logLevel",
      "workbox.core.setLogLevel",
      "workbox.core.LOG"
    ],
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-routing.dev.js": [
      "workbox.routing",
      "workbox.v",
      "workbox.routing",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Router",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.Route",
      "workbox.routing.registerRoute",
      "workbox.routing.NavigationRoute",
      "workbox.routing.Router",
      "workbox.core.cacheNames",
      "workbox.routing.NavigationRoute",
      "workbox.routing.registerNavigationRoute",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core."
    ],
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-strategies.dev.js": [
      "workbox.strategies",
      "workbox.v",
      "workbox.strategies",
      "workbox.core.cacheNames",
      "workbox.routing.Router",
      "workbox.routing.Router",
      "workbox.strategies",
      "workbox.core.cacheNames",
      "workbox.routing.Router",
      "workbox.routing.Router",
      "workbox.strategies",
      "workbox.core.cacheNames",
      "workbox.routing.Router",
      "workbox.routing.Router",
      "workbox.strategies",
      "workbox.core.cacheNames",
      "workbox.routing.Router",
      "workbox.routing.Router",
      "workbox.strategies",
      "workbox.core.cacheNames",
      "workbox.routing.Router",
      "workbox.routing.Router",
      "workbox.strategies.cacheFirst",
      "workbox.strategies.CacheFirst",
      "workbox.strategies.cacheOnly",
      "workbox.strategies.CacheOnly",
      "workbox.strategies.networkFirst",
      "workbox.strategies.NetworkFirst",
      "workbox.strategies.networkOnly",
      "workbox.strategies.NetworkOnly",
      "workbox.strategies.staleWhileRevalidate",
      "workbox.strategies.StaleWhileRevalidate",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core."
    ],
    "https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-precaching.dev.js": [
      "workbox.precaching",
      "workbox.v",
      "workbox.precaching",
      "workbox.precaching.InstallResult",
      "workbox.precaching.CleanupResult",
      "workbox.precaching.CleanupResult",
      "workbox.precaching.precache",
      "workbox.precaching",
      "workbox.precaching.addRoute",
      "workbox.precaching.precacheAndRoute",
      "workbox.precaching.suppressWarnings",
      "workbox.precaching.addPlugins",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core.",
      "workbox.core."
    ]
  }
}

@rviscomi rviscomi requested a review from tomayac May 11, 2021 21:07
@rviscomi
Copy link
Member Author

cc @jeffposnick @demianrenzulli

@rviscomi
Copy link
Member Author

rviscomi commented May 11, 2021

I've also tested this on vice.com, which does use Workbox, however the custom metric doesn't seem to be able to find any useful info in the SW: https://www.vice.com/service-worker.js

const workboxPattern = /workbox\.([a-zA-Z]+\.?[a-zA-Z]*)/g;

This doesn't seem to match on any of the lines in the SW. I based it off of this query which @tunetheweb may have written. Let me know if anyone has any suggestions for a better pattern detector.

@tunetheweb
Copy link
Member

Looks to me like only workbox-sw uses the workbox.blah naming convention that I assumed.

I don't know enough about WorkBox but is workbox-sw the simpler version of this library and www.vice.com uses the full version?

I checked last years query and vice.com does appear as a service worker site (and uses a similar sw script as present), but isn't picked up by my query so it, and other sites not using workbox-sw, would not have been included in my analysis last year.

@tomayac
Copy link
Member

tomayac commented May 12, 2021

@rviscomi, please assign @jeffposnick the review. He's the authoritative source when it comes to all current and historic ways to identify the library. What you have (with the workbox.* addition) LGTM, but not sure if there's more. Jeffy will know for sure.

@jeffposnick
Copy link

/workbox:[a-z\-]+:[\d\.]+/ would be a good way to match "modern" Workbox usage, with something like /workbox\.([a-zA-Z]+\.?[a-zA-Z]*)/ matching older Workbox usage.

Modern Workbox usage can be identified via strings like workbox:core:5.1.4, which you can find in https://www.vice.com/service-worker.js and other examples.

So if both could be accounted for, that would be great.

@rviscomi
Copy link
Member Author

Updated the workbox pattern to /(?:workbox:[a-z\-]+:\d|workbox\.[a-zA-Z]+\.?[a-zA-Z]*)/g to account for all versions. @jeffposnick do you only want to match the major version (\d) or the full version number ([\d.]+)?

Now detecting workbox on Vice:

{
    "https://www.vice.com/service-worker.js": [
        "workbox:core:5",
        "workbox:routing:5",
        "workbox:expiration:5",
        "workbox:strategies:5",
        "workbox:precaching:5"
    ]
}

And still working on publicstorage.

@jeffposnick
Copy link

Full version number ([\d.]+) would probably be the most useful, as folks using mismatched minor/patch versions of the different Workbox libraries could be a signal that something about their setup is amiss.

...unless there are any negative repercussions to breaking out the matches into individual entries instead of aggregating them more? Does it end up using up a lot more storage or something? In which case, we can option for aggregation—both on the major version number, and for the older-style Workbox usage, on just one-level of detail after the ., i.e. just workbox.precaching instead of both workbox.precaching.addPlugins and workbox.precaching.precacheAndRoute.

Whatever works best for you all to ensure that this stays scalable.

@rviscomi
Copy link
Member Author

Updated to extract the full version number. Storage shouldn't be an issue and any redundancy could be accounted for in the BigQuery analysis.

@rviscomi rviscomi merged commit 0a07b61 into master May 13, 2021
@rviscomi rviscomi deleted the pwa branch May 13, 2021 21:51
@jeffposnick
Copy link

Thanks so much, Rick!

Now that this is merged, can you provide a bit more context about the "how" and "when" of querying the new data?

@rviscomi
Copy link
Member Author

The new custom metrics will be picked up in the June crawl, the results of which should be queryable by the end of June. At that time I could provide a sample query to extract the PWA data. Do you have any specific metrics you want to query? For example, % of sites that use workbox at all, version distribution, package/method popularity, etc. Now is also a good time to double check that the custom metric enables the kinds of use cases you have.

@jeffposnick
Copy link

We'd be interested in questions like number of unique origins that use any version of Workbox, relative Workbox version popularity (Workbox v4 vs. v5 vs. v6), what percentage of sites that register a service worker use Workbox, and maybe a few other questions in that general vein.

CC: @tropicadri

@rviscomi
Copy link
Member Author

Ok sounds like we should be in good shape. The custom metric supports those use cases. 👍

@demianrenzulli
Copy link
Contributor

Hi folks,

As discussed in #2153, we are looking to expand this script to include (at a minimum) two more fields for the PWA section of Web Almanac: one to detect the usage of service worker events and another one to obtain the different calls to importScript(), in order to create a ranking of the most popular libraries used inside service workers.

I wanted to ask you if you could share details on how you usually test this code during development, to iterate on it and make sure it works.
Do you run the test directly in WPT after making changes, as explained here, or are you using any other ways to be able to debug and inspect the code before running the test in WPT?

I initially thought I could just obtain the value of $WPT_BODIES and $WPT_REQUESTS for an example PWA site, serve the resulting JSON from a node server, and then fetch it from a page, to be able to debug and inspect this code. But just taking a look at the length of the content of those variables for a typical site (example), I have the feeling that there must be another way.

I hope this request is clear. I'm just trying to find a more efficient way that let us iterate more quickly when maintaining these scripts. But maybe the one you're using is just running the test in WPT and see the responses there.

Thanks!

@tunetheweb
Copy link
Member

Yup as I understand it we just do some example runs against WPT for testing. Best to test a PWA page and a non-PWA page at least to confirm it both will work when expected and it won't throw an error if it can't find what it's looking for. Plus include links to WPT in any pull request to make it easier to review.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants