Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match generate and purge actions to CMS behaviours #168

Merged

Conversation

chrispenny
Copy link
Contributor

@chrispenny chrispenny commented Apr 3, 2023

Match generate and purge actions to CMS behaviours

Closes #165, #166, #167

One global change here is that previously we were running collectChanges() without forcing any specific VersionedMode. This causes issues because we are specifically expecting to work with LIVE records (especially as it relates to finding the URL for pages).

  • When we are unpublishing something, we want to work in a LIVE mode, but fetch changes before the unpublish is actioned.
  • When we are publishing something, we also want to work in a LIVE mode, but we now want to fetch changes after the publish is actioned.

Parent page is unpublished

Parent and child page created:
Screen Shot 2023-04-03 at 10 42 41 AM

Parent page is unpublished, and so is the child page (automatically by the CMS):
Screen Shot 2023-04-03 at 10 43 33 AM

A DeleteStaticCacheJob is queued with both the parent and child URLs represented:
Screen Shot 2023-04-03 at 10 44 00 AM

Page URL has changed

When a page URL changes, that affects the page itself and all of its children. All affected pages must purge the cache records for their old URLs and generate new cache records for their new URLs.

Parent page with original URL Segment:
Screen Shot 2023-04-03 at 12 50 27 PM

Updated parent page with new URL Segment:
Screen Shot 2023-04-03 at 12 50 49 PM

DeleteStaticCacheJob created with the old URLs of the parent and child page:
Screen Shot 2023-04-03 at 2 07 44 PM

GenerateStaticCacheJob created with the new URLs of the parent and child page:
Screen Shot 2023-04-03 at 2 07 51 PM

New configuration to control re-cache of parents and children

  • regenerate_parents
  • regenerate_children

Both are disabled by default. This is a new major which hasn't had a stable tag yet, so hopefully that's fine.

regenerate_parents

Closes #167

This configuration is valid for both publish and unpublish actions.

regenerate_children

Closes #166

Context is import, because sometimes we have to regenerate children regardless of what this configuration says.

  • If you unpublish a parent page, then all children must also be purged, regardless of config.
  • If you are publishing a parent page and the URL has changed, then all children must also be updated, regardless of config.

Unit tests

A fair few unit tests were using deprecated methods from getMockBuilder(). Practically speaking, I think these tests were not actually working.

I have revisited a bunch of the tests, and re-written them in a way that I believe is now testing functionality correcty.

Migrating from v4

SiteTreePublishingEngine changes

The following methods have been removed:

  • getToUpdate()
  • setToUpdate()
  • getToDelete()
  • setToDelete()

These methods used to deal with DataObjects. We are now collecting URLs instead of DataObjects, and these URLs are
being stored and manipulated by the following methods.

  • getUrlsToUpdate()
  • addUrlsToUpdate()
  • setUrlsToUpdate()
  • getUrlsToDelete()
  • addUrlsToDelete()
  • setUrlsToDelete()

Why did this change happen? TL;DR: We realised that purging URLs for pages that change their URL (EG: their parent page
changes, or their URLSegment changes) was not functioning as expected. This was really the only way to enable this
functionality to work correctly while maintaining cache integrety.

@chrispenny chrispenny force-pushed the pulls/cms-expected-behaviour branch 7 times, most recently from 56b3883 to 1601021 Compare April 3, 2023 21:41
@chrispenny chrispenny marked this pull request as ready for review April 3, 2023 21:44
Copy link
Collaborator

@mfendeksilverstripe mfendeksilverstripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍 Love the cleanup. Left some comments and some nitpicks as well. Noting that some of these are related to existing code which ended up in the change-set due to renaming of files.

src/Extension/Engine/SiteTreePublishingEngine.php Outdated Show resolved Hide resolved
src/Extension/Engine/SiteTreePublishingEngine.php Outdated Show resolved Hide resolved
src/Extension/Engine/SiteTreePublishingEngine.php Outdated Show resolved Hide resolved
src/Extension/Engine/SiteTreePublishingEngine.php Outdated Show resolved Hide resolved
@chrispenny chrispenny force-pushed the pulls/cms-expected-behaviour branch from e0a589f to 9dd92ab Compare April 4, 2023 04:23
@chrispenny
Copy link
Contributor Author

chrispenny commented Apr 4, 2023

Hi @mfendeksilverstripe hopefully all of your feedback has been actioned. I've also added some docs about migrating from 4.

Edited: As per Slack convo, I have undone the removal of the priority feature as part of my changes.

Changes are here:
0b66f71

@chrispenny chrispenny force-pushed the pulls/cms-expected-behaviour branch from 9dd92ab to 0b66f71 Compare April 4, 2023 19:29
Copy link
Contributor

@maxime-rainville maxime-rainville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a lot of substantial breaking changes to make the day we are schedule to tag a release candidate.

Could we find a way to do them in a backwards compatible way? e.g.: By keeping these methods backwards compatible but marking them as @deprecated:

  • getToUpdate()
  • setToUpdate()
  • getToDelete()
  • setToDelete()

Otherwise, could we provide alternative implementations of SiteTreePublishEngine/PublishableSiteTree and let people opt-in to using the newer/better behaviour?

README.md Outdated Show resolved Hide resolved
src/Contract/StaticPublishingTrigger.php Outdated Show resolved Hide resolved
src/Contract/StaticallyPublishable.php Outdated Show resolved Hide resolved
src/Extension/Engine/SiteTreePublishingEngine.php Outdated Show resolved Hide resolved
Copy link
Contributor

@maxime-rainville maxime-rainville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a lot of substantial breaking changes to make the day we are schedule to tag a release candidate.

Could we find a way to do them in a backwards compatible way? e.g.: By keeping these methods backwards compatible but marking them as @deprecated:

  • getToUpdate()
  • setToUpdate()
  • getToDelete()
  • setToDelete()

Otherwise, could we provide alternative implementations of SiteTreePublishEngine/PublishableSiteTree and let people opt-in to using the newer/better behaviour?

@mfendeksilverstripe
Copy link
Collaborator

@chrispenny I think that this PR should be split into smaller chunks. I see several areas that can be actioned separately:

  • handling of stage param when formatting URLs
  • linting / strict types
  • improvements to the QueuedJobsTestService - getJobByClassName()
  • the rest of the change-set that is trying to solve the reported issue

@mfendeksilverstripe
Copy link
Collaborator

@chrispenny Is this something we should also consider for this change?
Update of sibling pages after publish

@chrispenny
Copy link
Contributor Author

@maxime-rainville I think the issue with keeping getToUpdate() around is that I've fundamentally changed when these methods would be accessed.

Before: They would be accessed as part of flushChanges()
Now: They would have to be accessed as part of collectChanges(), or they would have to be updated to return an array of URLs (string) instead of DataObjects.

@chrispenny chrispenny force-pushed the pulls/cms-expected-behaviour branch 2 times, most recently from 647f49d to 8ba38e3 Compare May 1, 2023 02:20
@chrispenny chrispenny force-pushed the pulls/cms-expected-behaviour branch from 8ba38e3 to 543af36 Compare May 1, 2023 20:30
@chrispenny
Copy link
Contributor Author

@emteknetnz I've resolved the issue you mentioned.

Before:
When we were publishing our pages we were fetching the page URL from the DRAFT stage. This meant that if you published a child page when it's parent was unpublished, you would still get the full draft URL (EG: parent-page/child-page).

After:
I updated our logic to fetch URLs from the LIVE stage to fix the unpublishing actions, but this caused a regression in this particular scenario because the LIVE URL for that child page would now no longer include the parent slug (since the parent is unpublished).

Arguably - this is actually the "correct" URL for the page, according to Silverstripe, but it's inconsistent with the current behaviour of the module.

Solution

  • Unpublish actions all happen onBefore..., so these can stay as is, fetching URLs from the LIVE stage so that we correctly purge/uncache whatever the current LIVE URL is.
  • Publish actions all happen onAfter..., these need to be updated to use the DRAFT stage (if we want to keep the current module behaviour). This should still be safe, since the current DRAFT will match whatever was just published.

Another potential solution

I think this is out of scope for this change set (since it's another change in behaviour), but arguably we probably shouldn't even try to publish a page if enforce_strict_hierarchy is enabled, and a parent is unpublished.

Or, perhaps the FilesystemPublisher shouldn't save cache results for 404.

Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing is changing the word 'include' to 'regenerate' - I found it made the code far more intuitive to read

src/Service/UrlBundleService.php Outdated Show resolved Hide resolved
src/Job/DeleteStaticCacheJob.php Outdated Show resolved Hide resolved
src/Service/UrlBundleService.php Outdated Show resolved Hide resolved
src/Extension/Publishable/PublishableSiteTree.php Outdated Show resolved Hide resolved
src/Extension/Publishable/PublishableSiteTree.php Outdated Show resolved Hide resolved
src/Extension/Publishable/PublishableSiteTree.php Outdated Show resolved Hide resolved
@chrispenny
Copy link
Contributor Author

Thanks for all the feedback, @emteknetnz :)

Actioned here (plus many of your commits):
e5d596b

Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure what to do here since it's simply not working for me doing manual testing. Possibly and hopefully I'm just doing something very wrong?

My setup is using an install of CMS 5.x-dev silverstripe/recipe-cms + silverstripe/recipe-content-blocks along with this branch of silverstripe/staticpublishqueue

Additional project config is:

---
Name: mystaticpublishqueue
After: '#staticpublishqueue'
---
SilverStripe\StaticPublishQueue\Extension\Publishable\PublishableSiteTree:
  regenerate_children: recursive
  regenerate_parents: recursive

The following things are not working right:

  • vendor/bin/sake dev/tasks/SilverStripe-StaticPublishQueue-Task-StaticCacheFullBuildTask - this create a queuedjob, though when I try and run it via the QueuedJobsAdmin it times out after 30 seconds and logs me out. It did create files in the public/cache directory though. Also experienced this same timeout issue running this on the 6.0.0 branch though? [2023-08-14 13:31:03][ALERT] Fatal Error (E_ERROR): Maximum execution time of 30 seconds exceeded {"code":1,"message":"Maximum execution time of 30 seconds exceeded","file":"/var/www/vendor/silverstripe/framework/src/ORM/FieldType /DBDate.php","line":87,"trace":null} [] Tried reverting framework to 5.0.0, same issue.
  • Made a child page. Modified URLSegment of parent page. Ran vendor/bin/sake dev/tasks/ProcessJobQueueChildTask to process the automatically created jobs. Names of files in public/cache didn't change
  • Creating StaticCacheFullBuildTask as above though instead this time running vendor/bin/sake dev/tasks/ProcessJobQueueChildTask meant it didn't time out, though this time didn't seem to alter anything in public/cache

Like I say it seems like I'm doing something very wrong here since I find it hard to believe that things are actually this broken. Am I missing something obvious here?

docs/en/basic_configuration.md Show resolved Hide resolved
@chrispenny
Copy link
Contributor Author

@emteknetnz

SilverStripe-StaticPublishQueue-Task-StaticCacheFullBuildTask:

I've tested this in 6 and this branch. I think as long as you process your queue through CLI (vendor/bin/sake dev/tasks/ProcessJobQueueTask) then it works without error.

My guess would be that running the job through QueuedJobsAdmin is setting the current user to "run as", and perhaps this is causing issues.

Something to be picked up separately to this PR, I hope.

Modified URLSegment

I've set up my Site Tree as follows:

Screenshot 2023-08-15 at 9 15 28 AM

I've generated my static files using vendor/bin/sake dev/tasks/SilverStripe-StaticPublishQueue-Task-StaticCacheFullBuildTask and vendor/bin/sake dev/tasks/ProcessJobQueueTask:

Screenshot 2023-08-15 at 9 17 20 AM

Change the URL segment for the About Us page and publish:

Screenshot 2023-08-15 at 9 20 57 AM

Two jobs are now present in the queue, one to purge the old URLs, and one to generate the new URLs.

Purge URLs Job:

Screenshot 2023-08-15 at 9 18 01 AM

I note that there is ?stage=Live appended here, because our default is now to not remove it. I'm going to leave this as is for now and see what happens.

Generate URLs Job:

Screenshot 2023-08-15 at 9 18 09 AM

Ran vendor/bin/sake dev/tasks/ProcessJobQueueTask.

The original static files have all been removed (the folders are empty), and new static files have been created under the new name.

Screenshot 2023-08-15 at 9 24 09 AM

Looks like having ?stage=Live didn't negatively impact my process.

Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK got things working at my end. I did some a little testing and found a few bugs, they are sort of existing bugs so possibly we just raise them as new cards and merge this as is, though maybe it makes sense to fix these now

Fresh install of silverstripe/installer 5 + silverstripe/recipe-content-blocks with this PR branch of staticpublishqueue

Using config

SilverStripe\CMS\Model\SiteTree:
  regenerate_children: recursive
  regenerate_parents: recursive

Scenario 1 (less bad version of existing bug)

  • Create page "My page". Publish it. Run queue.
  • Add child page to it "My child page". Publish it. Run queue.
  • Unpublish "My page". It will also unpublished "My child page". Run queue.
  • Error 1 - there is an empty "my-page" folder there (not the end of the word
  • (existing behaviour is worse as it still has a cached "my-page/my-child-page)
  • Publish "My page". (My child page remains unpublished). Run queue.
  • Error 2 - public/cache has "my-page" folder with child "my-child-page" in it (that's a much bigger issue)

Scenario 2 (same as existing behaviour)

  • Create page "My page". Publish it. Run queue.
  • Add child page to it "My child page". Publish it. Run queue.
  • Unpublish "My page". It will also unpublished "My child page". Run queue.
  • Publish "My child page", Run queue
  • (both /my-page?stage=Live and /my-page?stage=Live and not available on the website)
  • Error 3 - public/cache has both "my-page" and "my-page/my-child-page" cached i.e. cache state doesn't match normal website behaviour

@chrispenny
Copy link
Contributor Author

chrispenny commented Aug 15, 2023

@emteknetnz yup, I agree with those bugs - it sounds like you're experiencing what I described here. The behaviour I saw with child pages getting cached (when the parent is unpublished) was at least a cache of a 404 page, and not a cache of the actual content.

Here are my thoughts on how we can fix this while maintaining BC:
90a22ff

TL;DR: Let's add a config that allows devs to specify response codes that won't have static files generated for them.

Everything works as expected with manual testing, the only issue I'm having is with the unit test. I can't figure out why, but no matter how I set the config value in the unit test here, when it gets to the publishURL() method here it always reverts back to whatever config was set as the default value in the FilesystemPublisher class here.

Could you please lend a hand?

Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding in new config options to exclude 404's in this PR, do this is a separate PR

Revert to 855b70f which was the last green build and we'll just merge this PR with known bugs

Could you please raise new issues cards to fix the observed bugs

@chrispenny
Copy link
Contributor Author

@emteknetnz reverted the commit and raised an improvement Issue here:
#172

I definitely won't argue if we want to call this a bug, but I've initially left that wording out, since caching a 404 response could (maybe?) be a behaviour that someone expects/desires.

@GuySartorelli GuySartorelli dismissed stale reviews from maxime-rainville, mfendeksilverstripe, and themself August 22, 2023 04:35

Steve's review is the up-to-date one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants