-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versions aren't created when owned relations are updated #195
Comments
It'd be great to get feedback on this problem from the @silverstripe/core-team. Another examples of impact from fixing this "bug": a page owns a has_one featured image. You edit the title of the image in asset-admin, the previously published version of the image and the page that owns it both go into a modified state, instead of just the image. It's important to note that next time you publish the page, you'd get the new version of the image with it anyway, so this is really an intermediary to highlight that there are draft changes in the page that you may otherwise be unaware of. One flow on effect from implementing a fix for this in the context of silverstripe-elemental is that every time you make a change to a nested content block for a page, you'd get a new version of the page. This is probably desired behaviour, but you'll quickly end up with a tonne of page versions. |
I guess it depends how tight the ownership relationship is. You'll have cases where the owned object is non-sensical without its owner (e.g.: elemental area and content blocks). You'll have other cases where the owner object can still be helpful without the owner (e.g.: a hero images that gets attached to a page). Off the top of my head, my preference would be to keep the existing behavior and maybe add an extra super ownership relationship model on top. |
Like some sort of |
Perhaps ... or maybe some options on the private static $owns = [
'ElementalArea' => [
'tightlyCoupled' => true
],
'HeroImage'
]; It might make sense to have that flow to the UI as well. e.g.: Prevent CMS users from directly publishing a content block and force them to always go through the parent page. |
Hmmmn. Interesting point. /cc @newleeland @clarkepaul . It doesn't really make sense to individually publish blocks (etc) if changes to these relations make a draft version of the parent page. Will probably need some UX advice here. Maybe it only makes sense to create a version of the page if:
This would probably have to "flow through" for elemental though. An element that is added to an area should probably create a draft version of the page the elemental area belongs to. And the order of IDs probably should matter as well. When you re-order a HasMany relation I'd expect a new version of the page. |
Hey, I just want to flag that there's a significant risk that we wind up with an inconsistent system if we attempt to fix this without understanding the previous decisions made during the development of this during the original 4.x build. Having @tractorcow weigh in on this will be helpful. For the rest of us, we'll want to dig through previous ticket history. I'll see if I can pick up some useful links. |
Ok so these two tickets go into some discussion about the meaning of owns and its relationship to cascade_deletes |
Notably: owns + cascade_deletes is the pre-existing mechanism developed to indicate "tightly coupled" objects. |
I think that automatically writing the "parent" object whenever we write a child object certainly has some risks: Namely, massive saturation of the versioned table potentially increasing the ability of users to track versions. More data !== better data. The mechanism we use internally to "segment" versions, where no specific version ID may exist for a point in time for a record, is actually "archiveDate", which is where we represent a point in time as the slice rather than the parent version number. The problem is that we only do this for all items in the tree AFTER the top level. I.e. get a version of a record by its ID, but the versions of owned objects by date. My suggestion is to look at allowing users to revert to specific points in time, including the top object version, rather than version IDs. The "select a version" interface becomes "select a date" interface (perhaps in addition to?). Another possible suggestion is that when we save a record in the CMS, we create a new version of that record, and bypass the "only if changes exist" check in DataObject::write(). This check should be retained as core default, but only bypassed if the specific save is triggered by a user interaction. That way there is an immediate user feedback between "I pressed save" and "I can see a new version in the history". What I would highly caution against is "create a new owner version every time an owned version is modified". |
The alternative is to only roll back whole changesets, which was one of the reasons why we opted to package all publication events into changesets. |
Yes, even better. :) However, a change may be in the wrong changeset (i.e. write of a different page to the one you want to roll back). |
I believe that the approach we had anticipated was that showing a "modified on draft" icon might interrogate the recursively owned objects. Do you have any recollection about this @tractorcow I could foresee performance issues with that, but I would be inclined to create separate denormalised datasets to speed this up, rather than dumping responsibility for tracking this on the Versioned field. |
So I would class this ticket not so much as a bug but as an API design decision that @ScopeyNZ believes is causing problems. Could we clarify exactly what user features that this is causing issues with, perhaps using specific examples eg from elemental? I get the feeling that it has something to do with the "modified" status in the tree view, and maybe something about the rollback feature, but that's a bit of a guess tbh I think it will be useful to reframe those things as the acceptance criteria of this card and then work out what (non-breaking) changes to our API we want to make to address this. |
Yes, I just think that would be possibly slower, but not impossible to do. I would try to avoid doing this on many items in a list, which probably is where we would get the benefit of such a function. I have to run off and come back to this ticket later by the way. Just a drive by opinion dump at the moment sorry. :) |
Yeah without getting caught up in the technical I think I can give a good example of the weird UX that you can achieve as it is currently using elemental as an example:
I didn't consider the idea of the "modified" status in the tree view actually. Currently the The problem is not every action is actually captured in a version. Although ordering is still a tough problem in it's own right (refer to the "context" section of this issue and the further down comment with more examples) I feel like adding versions when relations are specifically changed to different objects wouldn't cause a huge amount of versions - right now the lack of versions is a problem. My comment before I think makes sense:
|
OK that's a bit clearer, thanks
So, this seems like the core issues. Let's start with some assertions:
A couple of questions then is:
Merely nudging the version numbers seems like it might be insufficient inasmuch as we need to link the particular version of the owner/parent record to the particular versions to the owned/child records. Does our data model have support for this? Changesets at least have a good model for grouping a block of related changes (like different files in a git commit) together. But, to date, we only use them for publications, and using them for draft changes is likely to have significant performance implications. |
It's not about the number of versions, it's about whether you're breaking an assumption on which the entire rest of the application depends. |
In general, it seems like we have a few broad approaches for addressing this:
The 3rd approach is similar to the 1st option, but introducing a new place to write it rather than overloading the meaning of Version. It's also similar to a pre-emptively written caching layer for the 2nd option. |
I fear that going option 2 will fall into the same trap we've been experiencing very recently on a certain key project - where code like this doesn't perform well on larger sites. I'm not entirely sure on the differences on the first and third approach because I'm admittedly still a bit unfamiliar with the intricacies here. I'll do some delving and form an opinion over this week. |
Probably worth pointing out that Right now, when you publish recursively, a ChangeSet is created with matching ChangeSetItems for each affected object. All of these are saved to the DB and pretty much become useless afterwards. Basically, there's no way for the user to interact with that change set data. Even if there was, there's nothing useful you can do with them right now. |
Yeah, making improvements in keeping with the current api design would be nice if possible ;-) |
If we were to go with a change set approach, how would that affect devs with simple ORM relationships? Very basic example from my comment earlier: a blog post has one featured image. You modify some text field on the image and save, the page doesn't go into draft state even though it will change when you publish it (in so much as its featured image). Note that the dev hasn't even seen the name "ChangeSet" during this process, so it'd need to be wrapped up under the hood. I kind of see what you mean in terms of where you're making the change - you haven't modified the page at all, so why would you bump a version for it, but I think it's quite important to think of this holistically as content changes in general. Elemental is a good example where a page's data is mostly all stored in nested ORM relationships on the page rather than fields attached to it which are covered by the page's versioning by default. The more you do that the more and more useless versioning becomes for pages (the heart of the CMS). When I was thinking about it the other day I had a suspicion that a new API like @maxime-rainville suggested at the top would be a safer option. It'd kind of be like cascade saving but in the opposite direction to cascade publishing at the moment. |
Had a good whiteboard session with @ScopeyNZ. I think we should try to stick with the date-based preview, archive and rollback behaviour, and not try to achieve full relational integrity on versions of an ownership graph. This is assuming that while there might be some bugs in the current date-based rollback behaviour (e.g. around deleted records), they can be fixed and aren't structural flaws. But in order to allow for fast aggregate change tracking, I think we need to record the path from the originating change up the ownership graph, excluding siblings. I can't see a scenario where we can allow for fast aggregation without this level of denormalisation. Hence the need for a I'm suggesting a
Every save and/or publish operation to an object in the ownership graph would follow up the chain of its owners, and add an entry for itself and each of its owners. This operation would be wrapped in a transaction, and receive a UUID. Then you can use the following queries to identify the modification state of the whole object graph:
There's probably a way to do this in less than three queries, but it's a hell of a lot more efficient than traversing the whole object graph and comparing ScenariosIn the following scenarios, we're assuming ownership relationships between records of type Scenario 1: Bump page versionsStep 1: Step 2: Scenario 2: Version SnapshotsStep 1: New
Step 2: New
With the query examples above, we can effectively ask "give me all the snapshots where A1 is involved after it has last been published". Then "give me the latest versions of each object involved in those snapshots, and check if any of them haven't been published". I think that deletions would be treated the same way as modifications (marked as I'm further suggesting that we try to build this The tradeoff here is that concurrent writes involving the same object graph path by different authors could end up causing incorrect modification status information in that chain, e.g. if the same object is saved by one author, and published by another author, in the same second. My feeling is that's rare enough to be acceptable, particularly since it's not exactly data corruption ( |
A few things:
My understanding is that publication shouldn't create a new version, but rather merely copy the existing version to the Live stage?
I've got mixed feelings about this, but I'd be okay if we did that planning from the get-go that this would be a temporary module and, say, in 4.5 we'd expect to refactor the code into core. I think that retaining this as a module in the longer-term would make it harder to use and maintain. But TL;DR – go forth and build! 🎉 |
VersionSnapshot and VersionSnapshotItem table separation is fine, it has the added benefit of tracking separate creation dates for the entire snapshot, rather than multi-second windows for different objects. Regarding UUIDs, I'm starting to wonder if that's overcooking it. You're just as likely to create table locks on concurrent writes as with the existing Thinking about Two nested sub selects aren't ideal, but even if we don't find a way to flatten this into joins, it's a max of three queries per item you want to check (e.g. when listing a page tree), as opposed to ~2x the amount of queries as you have owned object connections in the graph. If I'm right and we can model the whole ownership graph, we could in theory use this for rollbacks. I can't see us using it for previews, since they need to preview un-owned versioned objects as well. Given we have already implemented a date-based rollback on ownership graphs which models what the preview does, I'm not sure how much value there is here. Another comment on Sam's |
I've added some ACs, can you all please check my understanding? @unclecheese has actually been battle testing my approach for a few days, has written out scenarios as "SQL unit tests", and performance testing with 1m+ rows. Looking good so far.
Is this right? In my mind, even with
We could create a version snapshot for the current ownership graph on any changed owned object, but there's no way to recreate this for historical versions. If we don't make a migration task for this, you'll have websites which inconsistently mark records as changed (nor not) based on the last time they've received a draft change in their ownership structure. This seems acceptable as a first cut, but we might be forced to write this task eventually. |
Criterion "Does not significantly slow down save operations"Performance testing was done with the following setup. Page -> Block -> Image One page has many blocks, each one of the blocks owns the image. Here's some testing resultsWithout snapshots:
Snapshots installed:
Notes
This may be indicating the tests aren't only including snapshots delta, but also other factors as well. ResultSave operation (for the image asset). Owners are blocks owning to the image.
|
OK, I'd say that's good enough. The "image has 40 owners" case is a bit of an edge case anyway, so the fact that save operations on this already fairly complex codebase slows down from 13s to 28s isn't great, but acceptable for an alpha-level module. I think @unclecheese took the optimisations as far as he could, the next step would be to calculate snapshots asynchronously, which would then cause all kinds of fun edge cases - but might be a necessary evil for more complex projects. Let's get a bit more feedback in the wild before sinking more time into it.
We still need to add some docs to versioned, right? It should point out the current limitations to developers, and then point to this module as an early stage approach. |
For anyone looking back on this issue - this has been closed with the introduction of new experimental modules that relate versions to one-another. See the docs: https://docs.silverstripe.org/en/4/developer_guides/model/versioning/ |
Description
When writing a relation that is owned by a versioned DataObject, a new version is not created. There's a few ways this can be observed and I've written a test that shows this (#196).
Steps to reproduce
Acceptance Criteria
has_one
,has_many
,many_many,many_many_through
without falsifying past "snapshots"Subtasks
Out of scope subtasks: (separate story)
Notes
_versions
metadata)many_many_extraFields
(should usemany_many_through
instead)owned
+cascade_deletes
) is out of scopeRelated
Related PRs
Adding tests for versioning updates through relation changes that should pass (but don't) #196 (Unit test update only)The text was updated successfully, but these errors were encountered: