Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Permalinks throw 404 if UUID cache is empty #5181

Open
tobimori opened this issue May 23, 2023 · 19 comments · May be fixed by #6568
Open

File Permalinks throw 404 if UUID cache is empty #5181

tobimori opened this issue May 23, 2023 · 19 comments · May be fixed by #6568
Assignees

Comments

@tobimori
Copy link
Contributor

tobimori commented May 23, 2023

Description

A client of mine uses the newly added UUID Permalinks for sending file links to customers. When a new deployment of the site happens, we flush the complete cache, including the UUID cache. (essentially rm -rf the folder)

Now, the non-presence of the UUID folder will make all Permalinks fail as 404, until they are created again. As the media folder URLs are prone to change, we need to use the permalinks.

Expected behavior
Permalinks need to also work when the UUID cache hasn't been generated yet, which means that any access of a permalink route should also generate the UUID cache.

To reproduce

  1. Delete the UUID cache.
  2. Try to open a file via its Permalink.
  3. 404
  4. Trigger anything that creates the UUID cache (e.g. store and save a file)
  5. Permalink works again

I have not tested this with Pages, but I suspect the behavior to be the same.

Your setup

Kirby Version 3.9.4

@distantnative
Copy link
Member

So far this is intentional, we can debate though if it's the right call we made:

Since looking up UUIDs without the cache means traversing through the whole site index, especially for large sites allowing to trigger this action via a permalink request seemed to be an easy vector for DDOS attacks. Which is why we made the lookup for permalink lazy, only from the cache.

@tobimori
Copy link
Contributor Author

tobimori commented May 23, 2023

I understand the decision behind that. Given that, it should probably still be resolved in some way when e.g. the entire cache is empty. Right now the behavior doesn't work for any site where the cache folder is non-persistent, which will probably be the case for most "modern" deploy setups. If it's actually a problem, make it a config option so large sites can turn it off if needed.

@bastianallgeier
Copy link
Member

As a quick fix, I would suggest that you repopulate the UUID cache after deployment.

@bnomei
Copy link

bnomei commented May 23, 2023

just my 2c. for autoid i was only running the indexing if the current index count was 0.

@lukasbestle
Copy link
Member

What Bruno wrote could be an option. Then the DoS/DDoS potential is very low. But we might need a mechanism to ensure that only one PHP process can start the time-intensive indexing process at a time.

@distantnative
Copy link
Member

@lukasbestle is there a good way to check how many entries one of our caches has?

@lukasbestle
Copy link
Member

Not a way that works with all cache drivers. But we could do it something like this:

  • When storing any value in the UUID cache, call $cache->getOrSet('status', fn () => false). This will ensure that a cache key status exists (but without writing the same value all the time, reducing wear impact on flash storage).
  • When a permalink doesn't exist, call $cache->exists('status') to check if the cache is non-empty.
  • We could also use this status key as a kind-of lock that prevents parallel indexing. I can design pseudocode for this if you want.

@distantnative
Copy link
Member

@lukasbestle to be honest this sounds like quite the hack/workaround though, trying to clean up behind developer behavior. doesn't feel too good to me.

I see it two ways:

  • Permalinks require the cache. So any developer that actively deletes the UUID cache should either not use permalinks or include in their script that deletes the cache to repopulate it again.
  • It could be a config option whether permalinks only do the lazy cache lookup or a full index lookup.

@lukasbestle
Copy link
Member

I get your points. It's not a fully clean solution.

However I think we cannot fully put the blame on developers. A cache should always be ephemeral (= clearing the cache shouldn't be destructive). This is already mostly the case as the UUID cache can be rebuilt from the content files, but with the permalink feature Kirby suddenly does depend on the cache and the feature doesn't work without it.

A cache can not only be missing if the developer clears it, but also just if a site gets deployed. Some hosting setups (especially cloud-based ones) don't have permanent file storage. In these cases, it's not even possible to keep the cache across deployments, the site is always deployed fresh. So the dev would need to actively build the cache on deployment. Which is of course possible, but an extra step for deploying Kirby and at least something we need to document more clearly.

It could be a config option whether permalinks only do the lazy cache lookup or a full index lookup.

I wouldn't do that TBH. If devs enabled that option, they would automatically introduce an application layer DDoS vulnerability into their site with no way to protect against.

So if we cannot solve this in the core by rebuilding the index on an empty cache, I think the only solution is to document the behavior.

@distantnative
Copy link
Member

But then it sounds like our permalink implementation is basically wrong. If a cache cannot be relied upon we can't have a feature that relies on the cache. To prevent attacks, can we instead throttle/limit the lookup from index?

@lukasbestle
Copy link
Member

I don't think the implementation is wrong. It's just a clash of two different requirements that cannot be solved.

To throttle the lookup, we again need to store a timestamp or something in the cache. At which point we could also go the route I mentioned above, which I think is more reliable.

@Moucky
Copy link

Moucky commented Dec 13, 2023

I am experiencing this problem, is there a way to repopulate the UUID cache?

@bnomei
Copy link

bnomei commented Dec 13, 2023

I am experiencing this problem, is there a way to repopulate the UUID cache?

the official cli can do it https://github.com/getkirby/cli

kirby uuid:populate

which is the same as calling this line of PHP code

\Kirby\Uuid\Uuids::populate();

@Moucky
Copy link

Moucky commented Dec 13, 2023

Thanks @bnomei, very much appreciated!

@mynameisfreedom
Copy link

Hi there, I've been struggling with this pretty hard. A client has a website with many file downloads, and we used the Writer field link feature to link the files. Which works as a permalink.

Because it is a very large website with many pages, the client demands to cache the website. But links don't work at all if you turn on the cache, because the 404 page becomes cache for that link, I suppose.

  • When content is changed, all cache is cleared, along with permalink/UUID
  • If someone click on a permalink while permalink/UUID cache is not yet generated for that link, that link becomes the 404 error. And because it is cached it will always stay 404 error, until the content is changed and cache is cleared
  • If permalink/UUID cache is generated somehow before the permalink is requested the link will work. But this happens rarely! Mostly it doesn't work

What can I do about it? Is there any suggested solution? Going through 500 hundred website pages and changing links to absolute links would take a lot of time and it will cost of course.

Unfortunately, I didn't find any warnings about permalinks and UUIDS, so I thought this was a well-tested feature.

Thank you in advance.

Kirby version 4.2
Server: Apache
PHP: 8.3

@tobimori
Copy link
Contributor Author

tobimori commented Jul 9, 2024

UUID/permalink cache should not be cleared when the content changes

@mynameisfreedom
Copy link

Hi @tobimori, thank you, that's good to know. But why file permalinks don't work?

Ok, I tried now with page resave, and they started working. Hmm, strange...

@tobimori
Copy link
Contributor Author

tobimori commented Jul 9, 2024

iirc saving a page should trigger population of the UUID cache, and then all files permalinks should work. the issue should just happen when manually deleting the folder/cache.

@mynameisfreedom
Copy link

mynameisfreedom commented Jul 9, 2024

Sorry, ignore my comments. All is good with permalinks on my end.
Although I learned a bit about UUIDs now. I didn't know they are cached.

Without telling me a client created a script that deletes cache once a day, but he set up a full cache deletion. Now we fixed that.

@distantnative distantnative self-assigned this Jul 24, 2024
@distantnative distantnative linked a pull request Jul 24, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants