Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependentPages should use a DataList, not an ArrayList #2327

Open
robbieaverill opened this issue Nov 13, 2018 · 14 comments
Open

DependentPages should use a DataList, not an ArrayList #2327

robbieaverill opened this issue Nov 13, 2018 · 14 comments

Comments

@robbieaverill
Copy link
Contributor

Version: SS 4.3.x-dev

The fact that SiteTree::DependentPages() generates an ArrayList means that it's executed whenever it runs. This is a major problem for high volume websites, particularly because this is called every time you call getCMSFields() on a page.

I really think we need to modify this functionality so it can return a lazily executed DataList instead.

@robbieaverill
Copy link
Contributor Author

I'm going to try out a POC

@robbieaverill
Copy link
Contributor Author

Another option is that we could add some config to disable back link tracking and dependent pages entirely. Large data sets make the CMS completely unusable by processing all of this data on every page load (in the CMS).

@robbieaverill
Copy link
Contributor Author

robbieaverill commented Nov 13, 2018

This seems to be working OK: https://gist.github.com/robbieaverill/9a185078f1883daf30ff039e00c504b4 - we lose the identifier string for the type of relationship, but that could be added back in in the GridField that renders this list if we want to (by checking the class type)

@robbieaverill
Copy link
Contributor Author

ping @silverstripe/core-team for feedback on whether you think it'd be acceptable to change this in 4.3 or not

@ScopeyNZ
Copy link
Contributor

Isn't this an issue because back links aren't necessarily pages? I think that's why it's using the DependentLinkType attribute right?

@sminnee
Copy link
Member

sminnee commented Nov 14, 2018

I think that that this would be okay but really, the lesson here is that our public API should have been declared to be SS_List not ArrayList. We need to be much more careful about not being overly restrictive in how we declare our public API.

This also applies to, for example, the choice of private vs protected vs public members.

So the typing hinting should be changed to be SS_List, not DataList.

@robbieaverill
Copy link
Contributor Author

Isn't this an issue because back links aren't necessarily pages? I think that's why it's using the DependentLinkType attribute right?

If that's the case then I guess we need to follow up on the todo in BackLinkTracking to implement a polymorphic many many list

@maxime-rainville
Copy link
Contributor

I've explored re-factoring DependentPage while looking at #2276 (comment)

That was my thought at the time. We decided to put that on the back burner because of time pressure, but I think all of my point are still relevant ... except maybe the "put your head in the sand" bit.

Problems with Dependent Pages logic

I've been digging on how SiteTreeLink and SiteTree::DependentPage have been implemented.

SiteTreeLink has a Polymorphic Parent has-one relation and a Linked has-one relation to SiteTree.SiteTree::DependentPageloops through all theSiteTreeLink.Linked` for the current page, then all the VirtualPage, then all the RedirectorPage ... then it sticks everything in an ArrayList.

There's some somewhat weird logic there that leads to weird behavior.

  • The DependentPage GridField edit links is broken because it assume every entry will be a SiteTree.
  • The Same entity can appear in there multiple times when using content blocks.
  • DependentPage doesn't appear to be confident its SiteTreeLinks exist because it manually loop through the list to make sure.
  • Sorting the GridField is painfully slow because it's using an ArrayList ... I came across an example where this would take 2.8 minutes in our key project. There's no way to effectively sort by URL because that value is computed in the application.

Possible solutions

Disable AbsoluteLink

Whatever we do, there's no way of sorting a list of SiteTreeLink by AbsoluteLink without loading all of the Parent objects and computing their AbsoluteLink in the application layer.

Refactor SiteTreeLink

I would ditch the polymorphic relation on SiteTreeLink. Any DataObjects that wants to say that it dependents on SiteTree for something would need to add its owner page to the SiteTreeLink relation (as opposed to itself). That would include RedirectorPage and VirtualPage. Might need to include some sort of qualifier column here. e.g: Redirect, Content block, Content Link, Virtual Page.

SiteTree would need some sort of hook or method that can call its children to ask them if they have a dependencies on certain pages (there's already something similar there). When saving a content block, it will probably need to trigger that hook on the parent page. We would also need to cascade delete SiteTreeLink when a SiteTree is archived.

Refactor SiteTreeLink and move the polymorphic relation to a many-many relation

That's similar to the previous solution, however we track each individual object attached to our SiteTreeLink via a polymorphic many-many relation.

e.g.:

SiteTreeLink:
  Linked: Page A
  Parent: Page B
  DirectDependents:
    - Page B itself beause it has a link in its WYSIWYG to Page A.
    - ContentBlock B that is own by Page B because it has a WYSIWYG link to Page A.

Each possible DirectDependents is responsible for creating the SiteTreeLink if necessary and adding/removing a reference to itself from DirectDependents. We'll know a SiteTreeLink is no longer needed when all its direct dependents are gone.

This "put your head in the sand" non-solution

The two previous solutions imply some API change that will probably break semver. They also imply having some sort of migration task to update the existing data.

I don't think there's any way to keep the current functionality, make it performant and not break API.

The best thing I can think of is remove AbsoluteLink from the GridField or save it on SiteTreeLink (which would involved keeping it up-to-date somehow, which would not be very consistent).

We could add some sort of AJAX action that only fetch the link on click somehow, if we want to allow CMS authors to access the AbsoluteLink.

@michalkleiner
Copy link
Contributor

michalkleiner commented Jun 27, 2019

Hi @chillu, should @maxime-rainville's points, mainly the

The DependentPage GridField edit links is broken because it assume every entry will be a SiteTree.

be extracted into a separate issue? Or changing the type of this issue to a bug?

That specific thing is a bug causing 500 error anytime a CMS user visits the Dependent pages tab in SS4 where a page is linked from a content block (as it's trying to fetch a subsite details for blocks that are not using the subsite extension — default CWP recipe).

@maxime-rainville
Copy link
Contributor

Not sure I came across the subsite issue before. That's going to be a different problem.

Do you want to raise a separate issue.

@chillu
Copy link
Member

chillu commented Jun 27, 2019

@michalkleiner Yep separate issue for broken edit links please

@intotheweb101
Copy link

Hello @robbieaverill

Getting this issue on CWP now, was there a solution to this?

Thanks

@robbieaverill
Copy link
Contributor Author

Not as of yet, as far as I know

@michalkleiner
Copy link
Contributor

michalkleiner commented Jun 2, 2020

This got me again on a CWP project, grrr.

New issue for the subsites problem:
#2554

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants