Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish Navigation URLs and Frame URLs #13706

Closed
7 tasks done
adamraine opened this issue Feb 25, 2022 · 3 comments · Fixed by #14149
Closed
7 tasks done

Distinguish Navigation URLs and Frame URLs #13706

adamraine opened this issue Feb 25, 2022 · 3 comments · Fixed by #14149

Comments

@adamraine
Copy link
Member

adamraine commented Feb 25, 2022

Terminology

This doesn't have to stick, I just need it to make writing the issue easier.

  • Navigated (Navigation?, Document?) URL: The the last URL the browser performed a hard navigation to.
    • Lighthouse resolves this URL tracking CDP Page.frameNavigated events.
    • Can only be determined in Lighthouse if there was a navigation
  • Frame URL: The URL that appears in the search bar. Can be changed with history.pushState or anchor links without making an additional network request or performing a hard navigation.
    • Lighthouse legacy navigation runner does not use this URL anywhere.
    • URL can queried using Page.getFrameTree.
    • Lighthouse FR runners can resolve this URL using the Puppeteer page.url() function (wrapped by driver.url()).

Problem

For navigations, gatherers need to know the navigated URL in order to find the main document, and there can be issues if the frame URL is provided instead. #13699 will ensure consistent use of the navigation URL for navigation mode. However, timespan and snapshot mode cannot resolve the navigated URL without Page.frameNavigated events, so they must use the frame url with page.url() instead.

In the LHR requestedUrl/finalUrl, we use the nav URL which can be confusing to the end user who probably expects the frame URL #13697. Again, this does not apply timespan/snapshot which have to use the frame URL everywhere.

Once #13699 is merged the following shows when each "type" of URL will be returned from different sources:

Gather context.url artifacts.URL lhr.requestedUrl / lhr.finalUrl driver.url() / page.url()
Legacy Navigation URL Navigation URL Navigation URL N/A
Navigation Navigation URL Navigation URL Navigation URL Frame URL
Timespan Frame URL Frame URL Frame URL Frame URL
Snapshot Frame URL Frame URL Frame URL Frame URL

Solution

To ensure the "type" of URL is consistant in all three modes, I propose the following setup:

Gather context.url artifacts.URL.* lhr.requestedUrl lhr.finalUrl driver.url() / page.url()
Legacy Deprecated See below Navigation URL Frame URL N/A
Navigation Deprecated See below Navigation URL Frame URL Frame URL
Timespan Deprecated See below N/A Frame URL Frame URL
Snapshot Deprecated See below N/A Frame URL Frame URL

New URL artifact:

interface URL {
	/** URL of the main frame before Lighthouse starts */
	initialUrl: string;
	/** URL of the first document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
	requestedUrl?: string;
	/** URL of the last document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
	mainDocumentUrl?: string;
	/** URL of the main frame after Lighthouse finishes */
	finalUrl: string;
}

Some notes on the above proposal:

  • Gather context.url is deprecate because we can get the Nav URL from artifacts.URL and the frame URL from driver.url()
  • lhr.requestedUrl will be an optional property that only appears on navigation LHRs.
  • Add new initialUrl to be a staple of every artifacts.URL
    • Frame URL
    • Would be about:blank on most navigations.

Implementation Plan

  • Add initialUrl and mainDocumentUrl to the artifacts.URL.
  • Switch audit/computed artifact usages of artifacts.URL.finalUrl to artifacts.URL.mainDocumentUrl
  • [Possibly breaking?] Deprecate context.url
  • [Breaking] Make requestedUrl undefined in timespan/snapshot on artifacts.URL and the LHR
  • [Breaking] Add finalDisplayedUrl and deprecate finalUrl
  • Add mainDocumentUrl to the LHR
  • Remove initialUrl and hold until we actually need it

Related

#8984

@adamraine
Copy link
Member Author

adamraine commented Mar 16, 2022

Interesting development, I think we can use Page.getNavigationHistory to resolve the mainDocumentUrl in timespan and snapshot mode. This would mainDocumentUrl doesn't ever need to be undefined. I'll do some more investigation to see if this will always give the correct value.

Edit: Perhaps not, I was hoping the userTypedUrl field would give us the document URL, but it isn't the document url if there was a 300 redirect.

@adamraine
Copy link
Member Author

adamraine commented Jun 14, 2022

From our discussions in #13819, it seems like we have 3 bikeshedding decisions to make:

  1. Do we keep finalUrl on the LHR as the final navigation URL? This will be purely for legacy support, our own clients should never use this.
  2. What do we call the final navigation URL? This issue originally proposed mainDocumentUrl <<<<<<<<, other ideas proposed:
    • mainResourceUrl
    • finalNavigationUrl
    • finalNavigatedUrl
    • finalNetworkUrl
    • htmlDocumentUrl
  3. What do we call the final frame URL? This issue originally proposed using finalUrl, I don't think anyone wants this anymore. Other ideas proposed:
    • finalFrameUrl
    • displayedUrl
    • finalDisplayUrl
    • finalDisplayedUrl <<<<<<<<<<<
    • finalPageUrl
    • endUrl
    • lastUrl
    • endingUrl

I'm not including finalDocumentUrl because @connorjclark proposed it for the final frame URL and @brendankenny proposed it for the final document URL (lol). Clearly that name is confusing.

My preferences:
1: Keep finalUrl on the LHR only, for legacy support
2: mainDocumentUrl because that's what we have right now
3: finalPageUrl, maybe swap initialUrl -> initialPageUrl

@paulirish
Copy link
Member

I'm not including finalDocumentUrl because @connorjclark proposed it for the final frame URL and @brendankenny proposed it for the final document URL (lol). Clearly that name is confusing.

brendan also pitched "navigated"/"navigation" for the finalUrl/finalNetworkUrl concept. IMO this doesnt work since the navigation API changes the URL but not this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants