-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GRIDFIELD] BUG: pagination in GridField does not work if DataObject default_sort field values are not unique #7249
Comments
Have you tried adding |
@flamerohr - i fixed my code as soon as I noticed the bug by adding ID as a sort key and that should also be the solution for this bug. Data Objects can be sorted in any way, but sort by ID ASC/DESC should always be added as the last sort criteria, no matter what. Check this out: I have now added to public function PaginatedObjects()
{
$list = MyDataObject::get();
return new PaginatedList($list, $this->getRequest());
} and I have added to <ul>
<% loop $PaginatedObjects %>
<li>$Title</li>
<% end_loop %>
</ul>
<% if $PaginatedObjects.MoreThanOnePage %>
<% loop $PaginatedObjects.Pages %>
<% if $CurrentBool %>
$PageNum
<% else %>
<% if $Link %>
<a href="$Link">$PageNum</a>
<% else %>
...
<% end_if %>
<% end_if %>
<% end_loop %>
<% end_if %> PAGE 2: AMAZING! I agree with you that this is a critical bug and I am very surprised to find it. It is so serious because anytime you do not sort by a field that is enforced to be unique on a database level, i.e. in basically all cases, e.g. sorting by Title, Created, LastEdited, SortNumber (SortableGridField), Sort (SiteTree), by Date, and so on.... ) you will potentially end up with missing / double-up records in a GridField / Paginated List. |
unfortunately, this is a potential problem in any given relational database, many queries do not explicitly provide a sort (or not a sort with at least a unique column) and has the "implied" sorting by ID, which is what happened here. I've marked as critical because of data disappearing, but I encourage always adding explicit |
I can't replicate this on |
I run this on a plain 3.6.1 |
I wonder if the auto increment on your ID is isn't working properly. Can you try deleting the entire DB and starting over? I'm curious if you're getting IDs that are non sequential with your Item numbers due to all the deletions. Try adding |
No problems with IDs
This error actually goes back to mysql. We are running SELECT * FROM And Mysql returns the records in a random order - although the first page of pagination seems to refute this theory.
@flamerohr - I have not come across many (if any) modules that explicitly sort by |
What kind of output do you get when you just run |
I run the following code: public function MyDataObjectRaw()
{
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC LIMIT 0, 10;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC LIMIT 10, 10;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
$this->html .= '<br />-------------------------------------';
$rows = DB::query('SELECT Title, ID FROM MyDataObject ORDER BY SameSame ASC LIMIT 20, 10;');
foreach($rows as $row) {
$this->html .= '<li>'.$row['Title'].' = '.$row['ID'];
}
} and the output that I get is as follows: item No.36 = 369 limit stuff 0 -> 9 item No.1 = 334 limit stuff 10 -> 19 item No.12 = 345 limit stuff 19 -> 29 item No.12 = 345 |
I'm not convinced this is a SilverStripe issue. The way that MySQL resolves an ambiguous sort is highly unpredictable. See (https://dba.stackexchange.com/questions/6051/what-is-the-default-order-of-records-for-a-select-statement-in-mysql).
That I can't replicate it should be an indication that we're dealing with some kind of unknown. It could be as simple as differing storage engines, MySQL versions, or both. In short, don't use ambiguous sorts. |
Hi Aaron,
It is definitely a MYSQL issue - but we can't change MYSQL so we will have to deal with the weirdness in Silverstripe.
That is true in theory, but I don't think that is true practice, most of the time you are sorting by an ambiguous sort field: e.g.
etc... Anytime they are not unique, you could end up with this problem. So, I reckon we should fix this, for the following reasons: (1) (2) There is an easy fix - add an ORDER BY BaseTable.ID to the end of all SQL Sort statements (3) The danger with this bug is that no one will notice until it has real life consequences. There is no error, these lists seem to operate fine, etc... (4) A perfect developer should remember to add the Sort By ID to the end of all sort statements that do not include an unambigious field - but if |
I could replicate this behaviour on my lappy:
and on our server
and here is my work machine (issue is also happening here):
sunnysideup/ecommerce_merchants | allow many businesses to sell within one shop. | 0.00 per day |
I really think this is dangerous scope creep for the ORM, and it sets a really bad precedent. The ORM is an abstraction layer between you and SQL. It's primary purpose is to make querying more succinct, more secure, and more declarative. It is not the ORM's job to fix your bugs. It should be very possible to write bad queries with the ORM, and sorting by something you know to have a singular value is one of those areas.
Personally, I've never had to do this. Keep in mind, you're testing a very extreme case, where all the values of a column are equal, and in any real-world scenario, it would be apparent to the developer that he was sorting on a column with uniform values. If, by contrast, I sort by |
The best practice, in these scenarios, is to encourage developers to implement composite default_sort values for their models, if they are aware that their sort column is not unique. I concur with @unclecheese on this. E.g. private static $default_sort = '"BaseTable"."Sort" ASC, "BaseTable"."ID" ASC'; These can also be specified via array syntax. private static $default_sort = [
'Sort' => 'ASC',
'ID' => 'ASC'
]; I believe the array syntax supports ORM to SQL column mapping under the hood, so that might be preferable to raw SQL fragments. :) Shall we make the outcome of this problem a documentation update? |
I agree, documentation to encourage this would be the best outcome for this... I don't believe enforcing an The |
What about Both models are bound to have non-unique sort values. |
We could add ID to those models exclusively, but that's still preferable to baking it into datalist for all classes globally. |
example 1:On an e-commerce website we offer discount vouchers. We had set the default_sort to Then the client adds a whole bunch of discount vouchers with the same start and end dates. When he is trying to delete them, there are around 8 / 46 that are not showing up in the ModelAdmin (and another 8 that show up twice) ... example 2:Member List for Club, Ms Smith can not find herself in list, calls admin, admin finds her on page three of list, sends her a screenshot and a link to page three. Ms Smith can not find herself on that page (there are a few Smith people, list is sorted by last name). What can we do?I am not married to a solution, but it seems to me that this is definitely something we should improve for our users because they have much to loose and nothing to gain from Mysql seemingly random sorts. I can see the following options:
|
|
@kinglozzer no, |
Here is my code as a repository: https://github.com/sunnysideup/silverstripe-sort-test I did some further testing. It does not happen with pages, it only happens with DataObjects. |
I think this is really getting into a philosophical discussion on what is the role of framework/ORM. That seems to be where we're divided. My thoughts: If you assign a singular sort clause to a DataObject for a column that is not entirely unique, and it produces unexpected results, that's on you as the developer who wrote that code. To expect the framework to jump in and save you is undue, for the same reason the template parser doesn't close your If we were to implicitly put |
Here are some alternative ideas:
Also - I think firstly that the bug needs more research (exactly when does it happen and how / why). |
That's
... and then clarified that this shouldn't happen at the ORM (or framework) level when sorting in general. When you're in a |
Hi Patrick, Thank you for our reply. I think the first thing is really understand the issue, when does it happen, where does it happen, etc... ? I also jumped to the "solution" but I think the key is to understand the issue first. It does not seem to happen with all DataObjects, so I want to know, for example, what DataObjects are affected, when, etc... Its random presentation also makes it really easy to miss or not to worry about it where it may raise its ugly head when you least expect it. Basically what I am saying: we are potentially presenting our users with bad data (some data multiple times, other data hidden) - this happens, for example, with GridField - out of the box. I think that is the crux. How we fix it is secondary (we may leave it to developers as suggested by core team). I want to fix it in all my sites, but I want to do it in a smart way. It seems to me when you add DataObjects programatically (i.e. many at the same time) then it is more likely to happen. Also, I could not replicate it on Pages, why is that? |
We've had a bit of a chat about this internally, and there's some consensus that a halfway measure to mitigate this is appropriate. First, the condition of MySQL returning a non deterministic result set when ambiguous sort columns are used is absolutely a feature, not a bug. From the MySQL docs:
While MySQL is providing the "expected result" (albeit non deterministic) in this case, it's fair to say All that said, to force Thoughts? |
Good thoughts ;-) I agree wit hall of that. Here are a couple of thoughts I had:
Putting these two together ... whenever you use limit in an ORM call, you will have to So, what I would recommend is: whenever we limit a recordset in an ORM call, we add sort by ID as the last sort phrase (with the ability to turn it off) by default. This is how I would do it: When putting together the final SQL for a database query - IF:
Research questions:
|
So the answer to most of your questions/concerns is that it's not a bug or oddity with MySQL, or Postgres, or pages, or DataObjects. This is something that's encoded in the SQL standard. The server is free to return the rows in any order it wants when identical values are encountered. That means, potentially, all database platforms are affected, and the way that this issue may manifest itself in various contexts, be it with the application of certain clauses, implementation of / version of storage engines, or just plain phases of the moon, is irrelevant. The point is that the SQL standard in this case tells us the result is nondeterministic, and we have an API that promises to be deterministic, so we need to reconcile that conflict somehow. I think if we were to add a |
are we able to infer Probably a bit too much magic for my taste, but it's an idea to bounce around. |
Since I think this is just in reference to |
How does it work in PHPMyAdmin? I think we need to do more research. I found that only some tables return with duplicates between limited segments of the table, while other tables do not (e.g. SiteTree). |
A challenge I've considered with a default_sort_column is that there needs to be a way to tell a datalist to reverse it in the case where a non-default sort column (E.g. Title) is reversed, otherwise groups of rows with the duplicate Title column would not reverse. That makes the solution not able to be applied so transparently. |
IMPORTANT A: SELECT * FROM `SiteTree` LIMIT 20, 10 B: SELECT * FROM `SiteTree` LIMIT 20, 10 ORDER BY `ID` C: SELECT * FROM `SiteTree` LIMIT 20, 10 ORDER BY `sort` NOT sorting and sorting by ID are equally fast, but sorting by Sort is significantly slower ... see: #7272 This evidence shows that sort by ID vs no sort key has no performance impact and sorting by ID is faster, most of the time, than sorting by the default_sort. |
Yeah it's not just in sorting in Ask: Yeah @sunnysideup, typically sorting by indexed fields is a bit faster than unindexed. |
OK, there's about twenty screenpages of discussion here ;) If you care strongly about resolution, can somebody please summarise the state of the discussion with options going forward? |
Summary:
|
I think we need to disambiguate the use (or the nomenclature) of So, I'd suggest we do the following:
Beyond the uncertainty of those two notes above on automatically disabling, I think we'd be set. Thoughts? |
I agree with @patricknelson that having another config setting called If the idea can be kept simple such as a single config |
I think some research may be useful:
Also - while no one has noticed this issues, this is not a benign one. At least two of my clients noticed it (and were unable to edit data because of it) in gridfields (where I had set the sort order to a field that was something like a date field). |
Adding an index to |
Actually, what if we just had a Functionality:
This satisfies:
What do you guys think about that? Am I missing any scenarios or edge cases, possibly? EDIT: Also, I propose that if we do this at all, at minimum it should be performed for |
Any movement ? |
Just skim read through this delightful thread. While the suggested feature would provide some benefit for some very specific use cases, it sounds like it's somewhat low value for most common scenarios. Those people who would benefit from this have a relatively easy alternative (just manually include a ID sort in their queries). My instinct would be to close this issue has a "won't fix" since there's no clear agreed upon action to take. |
Reread this (skimmed). That would be fine with me as it's still possible (as the developer) to manually define the secondary sort column via That is ultimately what I think this issue morphed into, which makes it more broad and less about a specific use case like you'd find just in So to summarize: Is |
FWIW I experienced this bug on SS4 today with 'out of the box' models - in the Security->Users section. A site had many members with no first or last name, so I got repeat/missing members on subsequent pages. Looks like it would remain an issue on SS5. Putting aside the bigger debate for now, quick win would be to fix up all core models to ensure consistent sorting (e.g. add ID and maybe Email to Member default_sort), and maybe update the docs to flag this issue as suggested previously? Edit: I guess that would only help for literally the 'default sort' case. So if I applied a sort on first/last name in my example by clicking a column header, the issue would come back because ID would no longer be included in the sort clause. |
Think the scope is bigger than the title suggests as it's not just GridField (it's all DataLists) and not just about default_sort. Even if you fix your default_sort to include ID, that won't come for the ride if you choose a different sort by using the sortable headers that are baked in to Silverstripe CMS. Another pain point is that it doesn't appear that you can sort on multiple fields with the GraphQL module. |
BUMP BUMP BUMP |
SUMMARY:
Pagination does not work well when the data is provided by MySQL 5.6 / 5.7 (php 7.1) where the data-set is sorted by a field (or fields) that are not enforced to be unique on a database level. The reason for this is that MySQL may return datasets in a random order (beyond what it has been asked to sort by).
You can possibly replicate this bug by yourself by using the tools listed below:
This means, for example, that:
There are still a lot of questions about this bug, but I have been able to consistently demonstrate this error on several machines with clean installs of 3.6.1
Details
the issue: when the default_sort value for a dataobject contains non-unique values for one or more rows then the GridField ends up NEVER showing some records while showing other records more than once:
How to replicate:
install SS 3.6.1 using standard installer
add the following dataobject (
MyDataObject
):then add modeladmin (
MyModelAdmin.php
):You get the following results:
For my client, I was sorting a GridField by EndDate and StartDate. Some of the items has the same EndDate and StartDate and thus were missing. Making it impossible for them to be found by the client.
The text was updated successfully, but these errors were encountered: