Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caches refreshing too many things on publish/updates? #5362

Closed
johancruyff2019 opened this issue Apr 29, 2019 · 7 comments
Closed

Caches refreshing too many things on publish/updates? #5362

johancruyff2019 opened this issue Apr 29, 2019 · 7 comments

Comments

@johancruyff2019
Copy link

johancruyff2019 commented Apr 29, 2019

umbraco v8
after publish or update pages or articles or any document type, umbraco goes to update cache (database and files)
but when i have about 120,000 record it take about one hour!!! it need to wait about one hour to publish a not or update a document.

but we need just update updated document cache not all cache date, you remove all cache data and rebuild it from binning, it's not impossible for large data websites


This item has been added to our backlog AB#2860

@johancruyff2019
Copy link
Author

Hi I re-created a new brand site with Umbraco V8, we need to transfer old site items to new one. I created a simple program to do that and it worked perfectly but after that I noticed transferred objects is available in backoffice but not in site.

After a quick lookup I found that every node should have a record in cmsContentNu table, I'm wondering how to that? Also I tried re-building NuCache database but it made a lot more problems such as unpublishing some of old published contents.

Thanks

@zpqrtbnk
Copy link
Contributor

There are several, distinct questions here.

About the second one, importing content... the easiest way would be, once your content has been updated, to go to the Settings / Published Status dashboard, and hit the Rebuild button - that would re-populate the cmsContentNu table entirely, from what you have in database.

About the first one, I hope we don't really rebuild the entire content cache anytime something is rebuilt. That would be horribly inefficient. Can you provide more details about this question? Like, how do you know we rebuild everything, how do you reproduce the situation, etc?

@dave0504
Copy link

dave0504 commented May 9, 2019

We are also having the same issue as Johan. Having looked at the ContentService it appears that the treeChangeType for exitsing published nodes is always set to refresh the whole branch, rather than the individual node, causing the node and descendants to be processed during the Cache refresh.
PR submitted

@zpqrtbnk
Copy link
Contributor

zpqrtbnk commented May 9, 2019

I see what you are doing on the PR. Unfortunately... there is a reason why we refresh the whole branch. Re-publishing a document may change its url segment, hence its url, and the url of all children - that being said it should not impact the content of the contentNu table - maybe we are too agressive.

So I cannot merge the PR "as is" but I do understand the intent and the possible issue (refreshing way to much). Need to look into it in details.

@zpqrtbnk zpqrtbnk changed the title NuCache Problem Caches refreshing too many things on publish/updates? May 16, 2019
@stevemegson
Copy link
Contributor

I don't think that changing the URL segment actually causes a problem. ContentData and PublishedContent only store their URL segment, not the complete URL, so the children don't need to be refreshed when their parent's URL has changed.

Unless I'm missing something, the only cache that needs to be refreshed/invalidated for the whole branch is the one in ContentCache.GetRouteById, and that will happen whatever we do.

@Shazwazza
Copy link
Contributor

This is what is happening when you publish a node - for this example, lets assume it's the Root node which has 1000 descendants (i.e. the whole site).

  • It updates a single row in the nucache table = 😎
  • It then queries the nucache table to return ALL descendant nodes = 😥
  • It iterates over each row (it uses a cursor, and doesn't load all rows into memory at least) and then rebuilds each 'Kit' and updates the in-memory nucache content collection for each item = 😥

I don't see why this is necessary at all since there isn't anything to update. We don't change any data in the nucache DB table for descendant rows so there is no reason to update what is in memory since it will be the same. Even if the content name is changed, this doesn't affect any of the data for any other node either in the DB or in nucache.

We do need to update all descendants when nodes are moved or if the path changes but this is not the case when saving/publishing a given node.

I think the PR is actually ok, i will run some tests first but just wanted to get any feedback here as well.

Let me know what you think.

@Shazwazza
Copy link
Contributor

I've merged in #5439

From my tests this all works well and there is certainly a ton less overhead. I believe the primary reason for the slowdown was because of the SQL call to load in all descendants. Of course updating nucache in memory is overhead too ... which has far less overhead in 8.2 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants