Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media 2021 queries (#2144) #2308

Closed
wants to merge 40 commits into from
Closed

Conversation

eeeps
Copy link
Contributor

@eeeps eeeps commented Aug 13, 2021

Progress towards #2144

(Initially empty) PR to track progress on Media 2021 queries.

Queries we need (based on the outline)

♻️ = try to use same query from 2019/2020 in order to facillitate comparison over time

Images

Impact

  • distribution of image bytes / page ♻️
  • distribution of image pixels / page pixels (viewport pixels?) ♻️
  • Images and LCP (link to performance's awesome analysis)
  • prevalence of tracking1x1 pixels

Content

  • Dimensions
  • Aspect ratios

Encoding

  • format use
  • bits / pixel by format

Embedding

  • lazy loading ♻️
  • srcset
    • adoption ♻️
    • w vs x descriptors ♻️
    • number of candidates ♻️
    • densities (distribution of currentSrcDensity values, distribution of srcsetCandidateDensities values) @rviscomi
  • sizes
    • implicit vs explicit (of images with srcsetHasWDescriptors what percent are sizesWasImplicit) @eeeps
    • valid vs invalid (of images with srcsetHasWDescriptors what percent have sizesParseError) @eeeps
    • most-common explicit values ♻️ @eeeps
    • (in)accuracy -- compare sizes size to layout size ([10, 25, 50, 75, 90] distribution: sizesAbsoluteError and sizesRelativeError, % of images with sizesRelativeError < 0.05) @rviscomi
  • picture
    • adoption
    • type-switching vs art-direction ♻️ (for images with isInPicture what percent has pictureMediaSwitching vs pictureTypeSwitching) @rviscomi
      • most interesting exotics (not min-width or max-width)
  • width and height attributes
    • how many images have both?
    • how often are they used to reserve space for flexible images?
  • accessibility
    • alt ♻️
    • figcaption ♻️
  • async attribute adoption

Layout

  • intrinsic vs extrinsic sizing (percent of images broken down by intrinsicOrExtrinsicSizing for height and width) @rviscomi

Delivery

  • Cross-origin vs same-origin
    • link to Resource Hints re: preconnect for cross-origin resources
  • URL-based resizing

Video

  • Doug did it!

@eeeps eeeps added the analysis Querying the dataset label Aug 13, 2021
@eeeps eeeps added this to the 2021 Analysis milestone Aug 13, 2021
@eeeps
Copy link
Contributor Author

eeeps commented Aug 16, 2021

@dougsillars could you enumerate the queries we'll need to complete the video section?

@rviscomi rviscomi marked this pull request as draft September 13, 2021 15:22
@rviscomi
Copy link
Member

Converting the PR to draft for now. @eeeps please mark "Ready for review" when ready.

@rviscomi
Copy link
Member

@eeeps @dougsillars how's the analysis going? Is there anything I can do to help?

@dougsillars
Copy link
Contributor

When are they due? I havent gotten to the queries yet.

@rviscomi
Copy link
Member

Milestones and due dates are documented at the top of the chapter issue. We allocated 2 months for analysis, so it's due September 30 to stay on schedule. Since you're both authors and analysts, it's ok if you need to spend more time on analysis, as long as you know that you'll have less time for writing.

@rviscomi
Copy link
Member

Let us know if there's anything we can do to help keep the analysis on track.

@rviscomi
Copy link
Member

rviscomi commented Oct 2, 2021

Hi @dougsillars @eeeps is there anything I can do to help unblock this PR?

@dougsillars
Copy link
Contributor

I'll work on this this week.

@rviscomi
Copy link
Member

rviscomi commented Oct 4, 2021

Thanks @dougsillars. Let me know if I can help at all! Just 4 more weeks left for authoring so I want to make sure you have enough time.

@rviscomi rviscomi added the ASAP This issue is blocking progress label Oct 12, 2021
@rviscomi
Copy link
Member

@eeeps @dougsillars any update on the analysis? Getting worried that there aren't any queries committed to the PR yet and the draft is due in a couple of weeks.

@eeeps
Copy link
Contributor Author

eeeps commented Oct 12, 2021

@rviscomi I, too, have been a bit worried! I've managed to clear some time off of my schedule last/this week and have enlisted the help of a colleague, @akshay-ranganath. I've been slowly finding my way through how to query the JSON output from https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/custom_metrics/responsive_images.js. Current status: I have that JSON pulled into a temporary imgElements table with nicely CAST columns, which should allow for easier querying. Like this:

[expand for a big chunk of SQL]

WITH pages AS (
SELECT url as pageURL, JSON_QUERY_ARRAY( 
    JSON_VALUE(
        payload,
        "$._responsive_images"
    ),
    '$."responsive-images"'
) AS imgElements
FROM `httparchive.sample_data.pages_mobile_10k`
), imgElements AS (
SELECT
    pageURL,
    imgElement,
    JSON_VALUE(imgElement, '$.url') as imgURL,
	CAST( JSON_VALUE(imgElement, '$.hasSrc') AS BOOL) as hasSrc,
	CAST( JSON_VALUE(imgElement, '$.hasAlt') AS BOOL) as hasAlt,
	CAST( JSON_VALUE(imgElement, '$.isInPicture') AS BOOL) as isInPicture,
	CAST( JSON_VALUE(imgElement, '$.hasCustomDataAttributes') AS BOOL) as hasCustomDataAttributes,
	CAST( JSON_VALUE(imgElement, '$.hasWidth') AS BOOL) as hasWidth,
	CAST( JSON_VALUE(imgElement, '$.hasHeight') AS BOOL) as hasHeight,
	CAST( JSON_VALUE(imgElement, '$.totalCandidates') AS INT64) as totalCandidates,
	JSON_VALUE(imgElement, '$.heightAttribute') as heightAttribute,
	JSON_VALUE(imgElement, '$.widthAttribute') as widthAttribute,
	JSON_VALUE(imgElement, '$.altAttribute') as altAttribute,
	JSON_VALUE_ARRAY(imgElement, '$.customDataAttributes') as customDataAttributes,
	CAST( JSON_VALUE(imgElement, '$.clientWidth') AS INT64) as clientWidth,
	CAST( JSON_VALUE(imgElement, '$.clientHeight') AS INT64) as clientHeight,
	CAST( JSON_VALUE(imgElement, '$.naturalWidth') AS INT64) as naturalWidth,
	CAST( JSON_VALUE(imgElement, '$.naturalHeight') AS INT64) as naturalHeight,
	CAST( JSON_VALUE(imgElement, '$.pictureMediaSwitching') AS BOOL) as pictureMediaSwitching,
	CAST( JSON_VALUE(imgElement, '$.pictureTypeSwitching') AS BOOL) as pictureTypeSwitching,
	CAST( JSON_VALUE(imgElement, '$.hasSrcset') AS BOOL) as hasSrcset,
	CAST( JSON_VALUE(imgElement, '$.hasSizes') AS BOOL) as hasSizes,
	CAST( JSON_VALUE(imgElement, '$.srcsetParseError') AS BOOL) as srcsetParseError,
	CAST( JSON_VALUE(imgElement, '$.srcsetHasXDescriptors') AS BOOL) as srcsetHasXDescriptors,
	CAST( JSON_VALUE(imgElement, '$.srcsetHasWDescriptors') AS BOOL) as srcsetHasWDescriptors,
	JSON_VALUE(imgElement, '$.sizesCSSLength') as sizesCSSLength,
	CAST( JSON_VALUE(imgElement, '$.sizesWidth') AS NUMERIC) as sizesWidth,
	CAST( JSON_VALUE(imgElement, '$.sizesParseError') AS BOOL) as sizesParseError,
	CAST( JSON_VALUE(imgElement, '$.sizesWasImplicit') AS BOOL) as sizesWasImplicit,
	CAST( JSON_VALUE(imgElement, '$.sizesAbsoluteError') AS NUMERIC) as sizesAbsoluteError,
	CAST( JSON_VALUE(imgElement, '$.sizesRelativeError') AS NUMERIC) as sizesRelativeError,
	JSON_QUERY(imgElement, '$.srcsetCandidateDensities') as srcsetCandidateDensities,
	JSON_QUERY(imgElement, '$.srcsetWDescriptorValues') as srcsetWDescriptorValues,
	CAST( JSON_VALUE(imgElement, '$.currentSrcDensity') AS NUMERIC) as currentSrcDensity,
	CAST( JSON_VALUE(imgElement, '$.approximateResourceWidth') AS INT64 ) as approximateResourceWidth,
	CAST( JSON_VALUE(imgElement, '$.approximateResourceHeight') AS INT64 ) as approximateResourceHeight,
	CAST( JSON_VALUE(imgElement, '$.byteSize') AS INT64 ) as byteSize,
	CAST( JSON_VALUE(imgElement, '$.bitsPerPixel') AS NUMERIC) as bitsPerPixel,
	JSON_VALUE(imgElement, '$.mimeType') as mimeType,
	JSON_QUERY(imgElement, '$.computedSizingStyles') as computedSizingStyles,
	JSON_VALUE(imgElement, '$.intrinsicOrExtrinsicSizing') as intrinsicOrExtrinsicSizing,
	CAST( JSON_VALUE(imgElement, '$.reservedLayoutDimensions') AS BOOL) as reservedLayoutDimensions
FROM pages CROSS JOIN UNNEST(imgElements) AS imgElement
)

SELECT imgElement, srcsetCandidateDensities, srcsetWDescriptorValues FROM imgElements WHERE hasSrcset = true AND srcsetHasWDescriptors = true

My current plan is to use this mega-intermediate-table to explore the sample data and build a suite of queries that work, and then whittle them back, deleting unnecessary intermediate columns from each query, before switching them over to the full dataset.

I'll try to keep this thread and/or the #web-almanac-media Slack updated with progress.

@dougsillars How are the video queries going?

@dougsillars
Copy link
Contributor

Sorry Rick and Eric - some work firedrills this week, but I plan to spend all day tomorrow, and time next week on this.

@rviscomi
Copy link
Member

rviscomi commented Nov 5, 2021

@akshay-ranganath @eeeps could you update the checklist above to reflect what (if any) remaining queries there are? This needs to be reviewed and merged ASAP.

@rviscomi
Copy link
Member

See #2583

@rviscomi rviscomi closed this Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset ASAP This issue is blocking progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants