Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for self-hosted runtime #25026

Closed
wants to merge 7 commits into from
Closed

Improve support for self-hosted runtime #25026

wants to merge 7 commits into from

Conversation

mattwomple
Copy link
Member

@mattwomple mattwomple commented Oct 13, 2019

Bring self-hosted runtime support in-line with amp-toolbox optimizer:

  • Do not auto-append /rtv/{rtv} to self-hosted runtime URLs
  • Support using the default CDN for cache-modified extensions when otherwise using a self-hosted runtime

Additional members in AMP_CONFIG:

  • defaultCdn: Set to https://cdn.ampproject.org so that cache-modified components can still optionally be retrieved from a supported source
  • cdnSupportsCacheModifiedExtensions: Set to false to use defaultCdn when retrieving cache-modified components, instead of the self-hosted component

Recommended reading:

Note: This partially addresses the original issue reported in #24927 but does not handle the case of components loaded by the runtime in singleDoc mode. For example, amp-loader and amp-auto-lightbox do not need to appear in the <head> of an AMP document to load. They are added during runtime. Since AMP pages have no means of manipulating AMP.config, these extensions use the default CDN. Improved self-hosting support for this scenario is intentionally being handled in a separate pull request since it involves more controversial changes and requires updates to the AMP validator.

Bring self-hosted runtime support in-line with amp-toolbox optimizer:
- Do not auto-append /rtv/{rtv} to self-hosted runtime URLs
- Support using the default CDN for dynamic components
@mattwomple
Copy link
Member Author

mattwomple commented Oct 13, 2019

There is an example included in the PR that can be removed later. It is meant to demonstrate a PWA retrieving self-hosted static components and non-self-hosted dynamic components. The demo is live here:

Notice that in both cases, amp-form comes from the self-hosted runtime. The following configuration runs just before shadow-v0.js loads:

<script>
  window.AMP_CONFIG = {
    cdnUrl: 'https://android.cmphys.com/temp/amprt',
    cdnSupportsCacheModifiedExtensions: false
  };
</script>

@mattwomple
Copy link
Member Author

/CC @sebastianbenz @choumx

@mattwomple mattwomple changed the title 🐛Improve support for self-hosted runtime Improve support for self-hosted runtime Oct 13, 2019
Copy link
Contributor

@kristoferbaxter kristoferbaxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding questions, can you take a look so we can move forward?

const rtv = getMode().rtvVersion;
if (opt_extensionVersion == null) {
opt_extensionVersion = '0.1';
}
const extensionVersion = opt_extensionVersion
? '-' + opt_extensionVersion
: '';
if (urls.cdn !== urls.defaultCdn && !useDefaultCdn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If single pass is enabled does this logic still function as expected? For both conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I mistakenly assumed single pass mode was related to localdev, not experiments. I will rewrite this to include spPath.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved in commit d76da0c. Let me know how the changes look to you.

Copy link
Member

@erwinmombay erwinmombay Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i'm understanding it correctly d76da0c could create a path that looks like cdn.ampproject.org/rtv/${rtvVersion}/sp/v0/${extensionId}${extensionVersion}.js? if so that might break.

We can actually remove that getSinglePassExperimentPath() logic now since we assign unique versions to the resources we deploy as a single pass experiment. (meaning they also have theyre own rtv version deployed that looks like the normal rtv url scheme)

feel free to ping me if you need further help on this part! (or ping me on slack too if you want a quicker response time erwinm)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make my explanation simpler, the whole sp infix logic is moot if its served on google amp cache (even with single pass experiment ON). it was needed in an old version of our experiment deployment but no longer needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erwinmombay Since removal of single pass experiment paths is unrelated to this PR topic, would you mind handling that change in a separate commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattwomple will do. I'll take it on today. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes merged here #25073

/**
* Dynamic AMP components
*/
const AMP_DYNAMIC_COMPONENTS = ['amp-geo'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded list (now in the base runtime) is disconnected from the source of truth that makes a component "dynamic."

Are there many others that are expected to fall into this category? What happens when they do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the future of how many components will become "dynamic components". I'm hesitant to assume it will always be just amp-geo. The rationale for identifying these components is discussed in ampproject/amp-toolbox#515 and ampproject/amp-toolbox#518. Or, tldr;

Not everyone who self-hosts the AMP runtime has the infrastructure to
support dynamic components like amp-geo, ref: AMP cache guidelines.

I agree with your 'source of truth' comment. This probably isn't the best place for this. Would it make more sense to introduce src/dynamic-components.js?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved in commit d76da0c. Let me know how the changes look to you.

location,
opt_isLocalDev,
opt_useDefaultCdn
) {
if (opt_isLocalDev) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future note @erwinmombay -- this is the kind of logic we should remove (local dev) when we support a development build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, the development will essentially be running in its own pipeline process to separate this out correct? (which will of course clean up these logical branchings)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

src/config.js Outdated
defaultCdn: DEFAULT_CDN,
useDefaultCdnForDynamicComponents: Boolean(
env['useDefaultCdnForDynamicComponents']
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this API? I think just adding one new field should suffice:

cdnForAmpGeo: env['cdnForAmpGeo'] || 'https://cdn.ampproject.org',

This would allow self-hosters to use caches other than Google's for amp-geo.

It also avoids introducing "default" and "dynamic component" both of which need explaining. AFAIK there's no plan for other dynamic components so we can cross that bridge when we get there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I'll be happy to make this change if it really is preferred, but if I were in your shoes and saw someone submit a PR to introduce cdnForAmpGeo, I'd think "gosh, that's a little short sighted".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or cdnForCacheModifiedExtensions since this is public. :)

"Dynamic" is a bit overloaded in AMP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. And then this documentation should be updated as well to adopt the new name: https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-guidelines.md#guidelines-adding-a-new-cache-to-the-amp-ecosystem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cc4f0ed makes this change, but I didn't want to introduce a 3rd CDN URL, so I opted for cdnSupportsCacheModifiedExtensions boolean. Let me know what you think.

*/
function isDynamicComponent(extensionId) {
return (
Boolean(extensionId) && AMP_DYNAMIC_COMPONENTS.indexOf(extensionId) >= 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is case sensitivity important? E.g. Element.tagName is always uppercase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue this is a correct implementation. Fuzzy matching on an ID feels sloppy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cc4f0ed allows fuzzy matching.

const rtv = getMode().rtvVersion;
if (opt_extensionVersion == null) {
opt_extensionVersion = '0.1';
}
const extensionVersion = opt_extensionVersion
? '-' + opt_extensionVersion
: '';
if (urls.cdn !== urls.defaultCdn && !useDefaultCdn) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is safe to do. It might break RTV pathing for Bing's cache for example.

Instead of inferring when to disable RTV pathing, we should probably make it an explicit option in config.js.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind following-up with @sebastianbenz about this? Here's our discussion: ampproject/amp-toolbox#517 (it's worth viewing the edit history on the original post there, too). If we did not make the right call in dropping '/rtv/{rtv}' for alternate CDNs, I'd like to make sure optimizer in amp-toolbox is updated again.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what's the use case for ampUrlPrefix? It doesn't seem to have an analog in the context of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ampUrlPrefix in the optimizer tool is akin to env['cdnUrl'] being defined before the runtime initializes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I misread what it does.

I think the optimizer case it would be confusing to have both ampRuntimeVersion and ampUrlIncludeRtv options, which doesn't apply here. Though I may be still missing some context: ampproject/amp-toolbox#517 (comment)

@kristoferbaxter
Copy link
Contributor

kristoferbaxter commented Oct 15, 2019

I'd like to propose we do not make the following change: Do not auto-append /rtv/{rtv} to self-hosted runtime URLs

Instead we ask hosting environments to emulate the conditions AMP Caches can deliver these resources from.

Adding additional logic to servers is almost always preferable than encouraging the work to be done on every users machine.

Edit: As part of this change we should also document the location of resources relative to a root host serving AMP binaries so that its easy for many to create servers to host the resources with tests to ensure validity.

@mattwomple
Copy link
Member Author

I'd like to propose we do not make the following change: Do not auto-append /rtv/{rtv} to self-hosted runtime URLs

Please reconsider. This requires anyone hosting the runtime (which should not be restricted to large corporations with significant CDN resources) to store 52 copies of the runtime per year. It should be possible to drop /rtv/{rtv} from the pathname when optimizing AMP pages, and the cache maintainer maintains the correspondence between the AMP page optimization version and self-hosted runtime version.

@kristoferbaxter
Copy link
Contributor

I'm not sure I am following. Where does the requirement for 52 copies come from?

@dreamofabear
Copy link

I think he means one release/RTV per week. Though I suppose you could have some custom server-side URL rewriting logic to avoid that.

@kristoferbaxter
Copy link
Contributor

Though I suppose you could have some custom server-side URL rewriting logic to avoid that.

Or the optimizer can only output urls that are supported by the serving environment.

Having a rtv value in the URL is valuable from a caching perspective, these resources should be max-age a long time, and immutable.

@mattwomple
Copy link
Member Author

Though I suppose you could have some custom server-side URL rewriting logic to avoid that.

Having a rtv value in the URL is valuable from a caching perspective, these resources should be max-age a long time, and immutable.

That's not really a necessity. Most CDNs support distinct internal and external max-ages. This allows the client to have a browser cache time of ~1 day for example, and gives the host the flexibility of updating the CDN resources as needed.

@sebastianbenz
Copy link
Contributor

I tend to agree with Matt here. What do we gain form enforcing RTVs for self-hosting? I'm sure that every serious publisher will self-host the runtime with RTV for the reasons @kristoferbaxter mentioned. However, it increases hosting complexity and I think it's a valid use case for devs to simply download the latest AMP runtime and serve it from their static site.

@@ -0,0 +1,32 @@
/**
* Copyright 2016 The AMP HTML Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2019

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated in commit cc4f0ed.

@kristoferbaxter
Copy link
Contributor

kristoferbaxter commented Oct 15, 2019

What do we gain from enforcing RTVs for self-hosting?

Simplification and reduction of the code in the runtime. It's a tradeoff, less complexity in the build/release/authorship/execution processes in exchange for more requirements in the hosting environment.

What I'm suggesting is the hosting environment should support the versions it hosts using the /rtv/${rtv} pathing. This reduces another variable of differences between the growing number of ways to leverage the AMP runtime.

I'd like to flip the question around too though, why should the runtime increase in size and testing complexity for work that can reasonably be done in a hosting system by those who choose to self host? Why not add a test suite that gives those who choose to self host more confidence that their setup works for their AMP documents?

This testing suite can then also be modified in the future to coincide with changes in hosting strategy.

Overall, this is not a blocker from me, but I'd really like for the participants of this discussion to agree to the additional risks and complexity we're adding here.

@dreamofabear
Copy link

A worthwhile discussion. From another light, this change allows AMP publishers to be more easily decoupled from the dominant AMP cache that happens to be run by Google. Also IMO the cost borne by runtime is small-ish (a few LOC).

@sebastianbenz
Copy link
Contributor

+1 to @choumx making it easy to decouple AMP pages from the cache sends an important message.

I also really like the test suite idea. Once self hosting the runtime is a thing, I plan to publish a runtime hosting module which emulates the AMP Cache behavior as part of AMP Toolbox. A test suite would be a natural by-product.

*/
export function isCacheModifiedExtension(extensionId) {
return (
Boolean(extensionId) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean check shouldn't be necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but prevents an unnecessary loop if an empty string is passed for extensionId.

const cdnSupportsCacheModifiedExtensions =
typeof env['cdnUrlSupportsCacheModifiedExtensions'] == 'string'
? env['cdnUrlSupportsCacheModifiedExtensions'] !== 'false'
: env['cdnUrlSupportsCacheModifiedExtensions'] !== false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a means of supporting cdnUrlSupportsCacheModifiedExtensions defined as either string or boolean.

return `${base}/rtv/${rtv}/${spPath}v0/${extensionId}${extensionVersion}.js`;
const extensionPath = `${spPath}v0/${extensionId}${extensionVersion}.js`;
if (urls.cdn !== urls.defaultCdn && !useDefaultCdn) {
return `${base}/${extensionPath}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we allowing this without RTV fingerprint? The fingerprint allows for easy longterm caching (important for the browser, not the cache server), where a shared URL forces frequent rechecks.

I think prioritizing the user's experience means that, if you want to self-host, you need to follow the RTV contract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, too, that this can cause an infinite reload loop if the loaded extension does not match the version loaded by v0.js. We automatically reload requesting /rtv/${V0's RTV}/v0/${ext}.js. If we can't specifically request an RTV, then we'll keep requesting /v0/${ext}.js, which has the incorrect version (because we started the reload loop), and continue looping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we allowing this without RTV fingerprint?

Please read conversation thread here and at ampproject/amp-toolbox#517.

Note, too, that this can cause an infinite reload loop if the loaded extension does not match the version loaded by v0.js.

I'll review the reload conditions this evening to see if changes are warranted here or there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @jridgewell agrees with @kristoferbaxter. :)

Note, though, that the "user benefit" assumes self-hosters want to match AMP cache's release patterns. In practice, they can set their own release cadence, browser cache hints, etc. Also, being same-origin allows use of a service worker which can ensure matching RTVs between v0.js and extensions.

Point taken though that this is "wild west" territory, dragons, etc. So some documentation on best practices and edge cases (e.g. reload loop) is a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, too, that this can cause an infinite reload loop if the loaded extension does not match the version loaded by v0.js. We automatically reload requesting /rtv/${V0's RTV}/v0/${ext}.js.

Commit 6bbbd5d disables version locking when self-hosting. It's up to the cache maintainer to ensure version alignment between runtime and components.

Copy link
Contributor

@jridgewell jridgewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do not auto-append /rtv/{rtv} to self-hosted runtime URLs

I do not want to see this merged. It sacrifices user experience to make it easier to self host. If you want to self host, you must agree to use the RTV directory pattern.

  • Support using the default CDN for cache-modified extensions when otherwise using a self-hosted runtime

I'm fine with this part.

@mattwomple
Copy link
Member Author

  • Do not auto-append /rtv/{rtv} to self-hosted runtime URLs

I do not want to see this merged. It sacrifices user experience to make it easier to self host. If you want to self host, you must agree to use the RTV directory pattern.

Your demand is arbitrary. A cache can still opt to follow the /rtv/{rtv} path structure and there is not demonstrated harm in giving a cache operator the option to choose their preferred versioning structure (/myversion/a.{rtv}), just unfounded fear. I'm unwilling to agree to these terms. If this is the end of discussion, I'll close this PR.

@dreamofabear
Copy link

I also don't believe there's One Right Way for self-hosting. Also, isn't the spirit of the Bento project to be more flexible with the way people can use AMP?

/cc @ampproject/wg-approvers since this is more contentious than I'd anticipated.

@cramforce
Copy link
Member

Could we discuss this in a design review? It is definitely a substantial enough change to warrant that.

I'm pretty sure we cannot make this change, because we rely on being able to load AMP of a particular version and that all AMP binaries in a page have the exact same version. I think (but am not sure) that this change can break that.

Maybe we can find other ways to achieve the DX improvements without breaking that.

@sebastianbenz
Copy link
Contributor

@honeybadgerdontcare and I are currently working on a design for supporting runtime self-hosting in AMP caches and AMP validator. This is closely related to the change proposed in this PR. Ideally we should discuss these changes together in the design review.

@mattwomple I'll share a first draft of our design doc for you to comment on as soon as we've got one.

Update calculateExtensionScriptUrl() and calculateEntryPointScriptUrl()
URL generation now that spPath is deprecated.
@mattwomple
Copy link
Member Author

@erwinmombay Thanks for cleaning up the deprecated experiment paths. This significantly simplifies the logic in calculateExtensionScriptUrl() and calculateEntryPointScriptUrl().

@jridgewell I'm sorry I responded to your comment with such an antagonistic tone. I've spent quite a bit of time (well before this PR) considering how to achieve a good balance of optimization and simplicity for small publishers. I'm very excited about the idea of allowing publishers to both optimize their AMP pages and maintain a simple, predictable URL to the AMP runtime. I look forward to continuing the conversation and will keep a more measured tone.

@sebastianbenz Thank you for taking the initiative on creating a design doc. I'm glad you and @honeybadgerdontcare are working on it because my first draft of "design" was most certainly a position paper. Let me know how and when I can contribute.

@honeybadgerdontcare
Copy link
Contributor

@mattwomple thank you for your perseverance on this. It's a challenging issue and we want to get it right. As usual cramforce is right in that we should bring this up in a design review. There are a lot of nuances here; preserving user first, AMP ideals, and giving publishers more options in how they serve their content. Would the October 23rd design review work for those interested?

@sebastianbenz
Copy link
Contributor

Could we do Nov 6th instead for the design review? The one on Oct 23th would be quite late for me due to timezones.

if (urls.cdn === urls.defaultCdn || useDefaultCdn) {
extensionPath = `rtv/${getMode().rtvVersion}/${extensionPath}`;
}
return `${base}/${extensionPath}`;
Copy link
Contributor

@cathyxz cathyxz Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For #25214 (comment), I was actually hoping to return just ${base}/${extensionPath} here if opt_isLocalDev or getMode().test is true. Is that a change we can make?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/CC @honeybadgerdontcare @sebastianbenz Would you two mind taking @cathyxz's request into consideration in your design doc?

@mattwomple
Copy link
Member Author

My intention is to drop this PR since the two changes made here are impeding progress on introducing self-hosted runtime support and a better alternative is available for amp-geo:

  • Require self-hosted runtimes to follow the same URL path structure in-use by cdn.ampproject.org:
    • Non versioned URLs: https://example.com/runtime/
    • Versioned URLs: https://example.com/runtime/rtv/{rtv}/
    • Runtime metadata: https://example.com/rtv/metadata
  • No special treatment for amp-geo - PR amp-geo: Fall back to API for country #26407 is a better solution to the problem of self-hosting without the ability to modify amp-geo-0.1.js on-the-fly based on country of client request.

Are there any objections or comments before this is closed?
/cc @sebastianbenz

@mattwomple
Copy link
Member Author

Dropping PR for reasons listed in previous comment, as well as because GitHub account @mattwomple will not be used after Jan 2020 (retiring work account but will continue AMP contributions under personal account @mdmower).

@mattwomple mattwomple closed this Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.