Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages with querystrings are not automatically purged when their caching enabled #34

Closed
archon810 opened this issue Sep 5, 2019 · 14 comments

Comments

@archon810
Copy link

archon810 commented Sep 5, 2019

Hi,

We recently rolled out AMP to our site but are now running into issues with AMP pages not getting their W3TC cache cleared.

The regular non-AMP pages get their cached cleared properly upon pressing Update in a post. However, AMP pages (with ?amp appended), do not, whether the W3TC AMP extension is activated or not.

I've looked at all the settings I can find and even tried 0.10 released today, but still AMP does not get updated. This is bad for obvious reasons.

Here's an example of a post I updated many minutes ago.

Regular page:
image

AMP page (same url with ?amp appended):
image

@maxicus Any ideas?

Thanks!

This was also cross-posted to https://wordpress.org/support/topic/amp-page-cache-does-not-get-cleared/.

@archon810
Copy link
Author

In fact, now that I think about it, it's not just ?amp, it's also the same page with any GET parameter appended that doesn't get its cache cleared. For example, when updating https://example.com/post, https://example.com/post?amp doesn't get cleared, but also https://example.com/post?1 or https://example.com/post?a=b, etc.

What should ideally happen is anything matching https://example.com/post* would get its cache cleared, which would include any variations of the url - otherwise they'll all get stale cache.

@maxicus
Copy link
Contributor

maxicus commented Sep 7, 2019

Purging arbitrary query strings when "Cache with querystrings" enabled can't be supported by plugin.

Small amount of alternatives may be handled by hooks, but for general coverage you may use services like Varnish for that functionality.

@maxicus maxicus closed this as completed Sep 7, 2019
@archon810
Copy link
Author

So what's the solution to at least support ?amp purging @maxicus?

AMP is getting more and more popular, and right now W3TC straight up breaks cache busting for AMP sites.

Please re-open the ticket, or if you have a solution, please provide it here. I'm sure I won't be the last person to run into this issue.

@maxicus maxicus changed the title AMP page cache does not get cleared Pages with querystrings are not automatically purged when their caching enabled Sep 12, 2019
@maxicus
Copy link
Contributor

maxicus commented Sep 12, 2019

Technically ideal case for amp pages is to use /my-page/amp instead of /my-page?amp URL structure. In that case you may use "Disk: Enhanced" cache engine which is often faster and W3TC AMP Extension will handle purges.

If you require to have /my-page?amp by some reasons - you can't cache it via "Disk: Enhanced".
In that case you need to use pgcache_flush_post_queued_urls hook similar like it's used in Extension_Amp_Plugin.php and add '?amp' URLs for purging.

For more complex set ups when unknown and potentially large number of page variations with query strings should be purged you should add Varnish server to your stack and W3TC will send mask-based purge requests there.

@archon810
Copy link
Author

archon810 commented Sep 12, 2019

Actually, /amp has been deprecated in favor of ?amp, and in the future in favor of something like ?amp=1 or ?amp=⚡. See:

ampproject/amp-wp#1148

ampproject/amp-wp#2204

ampproject/amp-wp#1383

Also, Disk: Enhanced is not an option as we have multiple web servers, which is why we're using memcached for sharing the cache.

For now, to work with purging just the ?amp requests when we press Update, I'm thinking we could issue manual purges with w3tc_flush_url() for ?amp urls from the same place in the logic where we send update-cache pings to AMP CDN caches. Is there any benefit to using the hook as opposed to that function?

In the future, if we ever make it there and transition the whole site to AMP-first, we won't have ?amp urls anymore, but I'd still like to figure out the right way to be able to cache url params and have all the right caches flushed upon updates.

@maxicus
Copy link
Contributor

maxicus commented Sep 12, 2019

w3tc_flush_url() will work too of course, but pgcache_flush_post_queued_urls hook is more correct in that situation, since you want to complement the whole list of pages to be purged with their ?amp variation.

For example when post is updated - category pages, homepage, author pages are purged.

@archon810
Copy link
Author

archon810 commented Sep 13, 2019

@maxicus Related to the above, I noticed in the W3TC AMP plugin, you actually do seem to attempt to flush AMP urls, except they're hardcoded to /amp, which may or may not be the scheme used at any given site. As mentioned above, /amp is deprecated, and ?amp should be flushed for sites that use ?amp. Perhaps the plugin should add the ability to specify the AMP url structure to fix it?

https://github.com/W3EDGE/w3-total-cache/blob/master/Extension_Amp_Plugin.php#L67

@bwmarkle
Copy link
Contributor

bwmarkle commented Oct 3, 2019

Hi @archon810,

We've been chatted about this issue internally, and I think we have a solution. It should be in one of our next releases.

I wrote a mock blog post to kind of summarize the idea we're going with. I'll paste the image below so you can see the idea we're going with:

image

Again, hopefully this will be out soon. Let us know what you think!

Thanks,

  • Brad

@bwmarkle
Copy link
Contributor

Hi @archon810,

We're getting close to releasing a minor version that resolves this issue:

The regular non-AMP pages get their cached cleared properly upon pressing Update in a post. However, AMP pages (with ?amp appended), do not, whether the W3TC AMP extension is activated or not.

Can you share with us what WordPress plugin you're using that is using ?amp so we can do some final testing?

Thanks
- Brad

@archon810
Copy link
Author

@bwmarkle As I mentioned above here #34 (comment), we're using the official AMP WP plugin.

https://wordpress.org/plugins/amp/
https://github.com/ampproject/amp-wp

@bwmarkle
Copy link
Contributor

Hi @archon810,

W3 Total Cache 0.12.0 has been released, which includes amp purge settings. If you have have some time to test, this should resolve the issue at hand.

image

Thanks
- Brad

@archon810
Copy link
Author

@bwmarkle Thanks for this. There's one small issue that is described here: ampproject/amphtml#24326 (comment).

Basically, sometimes Google makes urls with ?amp and sometimes with ?amp= and treats those as different cache keys, so in our solution, we end up invalidating both with something like this:

foreach (array($amp_permalink, $amp_permalink . '=') as $link) {
            // flush w3tc cached amp pages
            if(function_exists('w3tc_flush_url')){
                w3tc_flush_url($link);
             }

Can W3 perhaps account for this unfortunate issue?

@archon810
Copy link
Author

@maxicus In addition to that, we will stick with our solution, which only dumps 2 AMP urls per story, since AMP is only enabled on individual posts and not categories/tags/etc, because W3TC's AMP plugin ends up flushing cache for potentially additional 50+ pages that don't even have AMP versions, and this flushing process takes quite a bit of time and slows down post updates.

The issue is described here: #44.

@maxicus
Copy link
Contributor

maxicus commented Jan 13, 2020

@archon810

df81570 commit normalizes all ?amp=, ?amp=1 pages to a single cache key so purging of only one entry is enough.

But based on your description you have a problem at CDN level where you will have to purge all possible variants unless CDN itself supports masked purge. Not something what plugin may support unless CDN does.

You may also note that the root of the problem is querystring-based url structure which allows much more url variants to be made...

As a workaround you may consider to canonicalize urls with 301 redirects, like wp core does for some irregular urls. That may help to solve upcoming issues you will potentially have later with ?amp=2, ?amp=3 and ?amp=archon810 urls in your CDN cache :) Drawback is the redirect loop possible if CDN really creates own redirects like you describe in a ticket, but it's not how CDN's usually work (bug in cdn? bug in origin responding redirect sometimes which is later cached?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants