FIX isInDB actually checks if the object is in DB #7799

dhensby · 2018-01-24T17:19:00Z

Both DataObject::exists() and DataObject::isInDB() claim to check if the current object is in the DB yet neither do so and both provide a slightly different way of determining if the object does in fact "exist in the DB".

This change introduces checking of the DB when checking the DB for an object's existence.

kinglozzer · 2018-01-24T17:20:36Z

This seems far less performant than the existing check, is there a bug this fixes?

dhensby · 2018-01-24T17:25:35Z

Clearly there's an impact in checking the data source for whether an object exists - however the assumption that "if a DO has an ID it exists in the DB" is deeply flawed and (unfortunately) one that is prevalent amongst our code which I'd like to slowly work on removing

kinglozzer · 2018-01-24T17:34:43Z

Hypothetical example: dataobjects which may or may not have an image attached.

<% loop $Objects %>
    <% if $Image %>{$Image.Fit(100, 100)}<% end_if %>
    {$Title}
<% end_loop %>

This change is going to result in one extra database query for each dataobject that has an image, right?

I’m a little hesitant about this one, as far as I’m aware we’ve not seen any real-world issues caused by this behaviour. How about meeting halfway - leave exists() to check for ID and update the docs for that method, but update isInDB() to actually check if the object is in the database?

tractorcow · 2018-01-24T20:29:24Z

I'm extremely nervous about this change, as we use isInDB() in many many places, and there is going to be a serious performance cost here.

The "is actually in the DB" Is meant to be maintained by deleting the object having it's ID reset.

The extra API (which ignores things such a stage, extra tables, etc) feels awkward to have to remember to invoke when doing low level DB methods. E.g. using SQLDelete() to directly delete rows won't automatically update.

I would vote to just close this I'm afraid.

If you can show an example of where we absolutely need a real DB check, I would probably recommend a solution based on that specific example instead.

dhensby · 2018-01-25T15:34:18Z

we use isInDB() in many many places and there is going to be a serious performance cost here

It hasn't introduced a significant impact from my limited testing. I've found that a default page load on a vanilla install doesn't invoke the SQL query once.

I'm not doubting that checking the DB for existence of a record will introduce an overhead, but to argue that we should just avoid checking the DB when claiming to check the DB on performance grounds seems misguided to me. What other parts of the framework should we stop actually doing what it says because it's faster to "guess"?

The "is actually in the DB" Is meant to be maintained by deleting the object having it's ID reset.

We do nothing to enforce that the ID can't be set by user code, so it's not in any way reliable way of determining if the object exists.

The extra API (which ignores things such a stage, extra tables, etc) feels awkward to have to remember to invoke when doing low level DB methods. E.g. using SQLDelete() to directly delete rows won't automatically update.

Using low-level APIs will always be awkward in terms of clearing cache stores. At the moment if you do DataObject::get_by_id(Page::class, 1); then run an SQLDelete to delete the page DataObject::get_by_id(Page::class, 1); will still return an object and it will still return true for isInDB (unless you remember to clear the caches) - which would also clear the isInDB cache too...

If you can show an example of where we absolutely need a real DB check, I would probably recommend a solution based on that specific example instead.

TBH, if the answer is "don't use our methods that claim to check the DB because they don't do what they say" then that's an argument for deprecating these methods

dhensby · 2018-01-25T15:35:03Z

I think I'm going to propose an RFC on this because there is clear flaws to assuming that an ID != 0 means presence in the DB (especially as anything can set the value of the ID field) and there are further changes I want to make that all come under a similar umbrella. This may also be too radical for a 4.x release and instead focused on 5.

But there is clearly a mismatch in expectations here - isInDB does nothing to check an object is in the DB, it checks theres a numeric ID > 0 - it should be called hasNumericIDGreaterThanZero(). DataObject::exists() is equally misleading as the documentation states:

The default behaviour for a DataObject is to return true if the object exists in the database, you can override this in subclasses.

The exists() and isInDB() methods also have slightly different implementations that result in different behaviours and (ultimately) misreport presence in the DB.

$page = new Page();
$page->ID = "Oh looks, this is an ID";
$page->isInDB(); // false
$page->exists(); // true
$page->ID = 9999;
$page->isInDB(); // true
$page->exists(); // true

tractorcow · 2018-01-25T20:09:53Z

Using low-level APIs will always be awkward in terms of clearing cache stores. At the moment if you do DataObject::get_by_id(Page::class, 1); then run an SQLDelete to delete the page DataObject::get_by_id(Page::class, 1); will still return an object and it will still return true for isInDB (unless you remember to clear the caches) - which would also clear the isInDB cache too...

We should ensure that any ORM method for selecting / deleting behaved consistently. We can't protect against raw SQL queries, which is why we don't recommend them if possible.

dhensby · 2018-01-26T17:22:27Z

exactly ;)

ScopeyNZ · 2018-09-04T21:27:12Z

src/ORM/DataObject.php

+    /**
+     * @param DataObject $object
+     */
+    public function unregister_object($object)


This can(should?) be marked as static too?

tractorcow · 2018-09-05T21:29:26Z

How about isInDB(true) to force a hard check, but default to false?

sminnee · 2018-10-12T03:56:05Z

There's definitely still a place for a method that checks "has this DataObject been written" without actually querying the database. I agree that checking whether ID is set is weak (your other RFC covers why) but we can still remember that the record has been written and return true for the remainder of the PHP request.

We could change this to hasBeenWritten(), or maybe we just change the API docblock? Or we could have two different methods in one, with a binary flag, as Damian suggests.

tractorcow · 2018-10-15T02:15:19Z

->exists() will return false for any archived items; Which may seem logical at first, but if you are doing a loop over versioned records in a template, for example, that'll suddenly make that loop empty.

<% if $Item %> internally does ->exists() on the underlying object.

sminnee · 2018-10-15T02:45:44Z

src/ORM/DataObject.php

+        }
+        $class = $this->baseClass();
+        if (!isset(self::$_cache_in_db[$class][$this->ID])) {
+            $sqlSelect = new SQLSelect();


The only way that you can get to this code is if you have manually set the ID and not yet written the record. So, this code is picking up the case when you've manually set the ID to an existing record.

We have the option of simply returning false in this case. If we did this, the functionality would match the "has been written" logic that I've recommended. It would remove the cause for performance / logic concerns that others have had.

My view would be that checking the database for the presence of an ID that has been manually applied is a narrow & different use-case from the current uses of isInDB(). In many cases where this happened it's not so much "this record is in the database" as "I am about to corrupt my database". Code responsible for ID collision would need some careful and separate thought.

So I think that should simply return false in this case, and the update the docblock of isInDB() to

Returns true if this record was read from the database or has been written to it. Note that it does not query the database for the presence of a manually-applied ID.

Note also that, after this change, the static registry can probably be replaced with an instance-specific private boolean $isInDb variable.

robbieaverill · 2018-10-15T08:08:53Z

@tractorcow

->exists() will return false for any archived items; Which may seem logical at first, but if you are doing a loop over versioned records in a template, for example, that'll suddenly make that loop empty.

That seems OK, but I think that isInDB() would ideally return true for archived objects. Currently it behaves the same way as exists(). I had an issue recently with unexpected behaviour on archived items because ->isInDB() was false. It is in the database though... Thoughts?

maxime-rainville · 2019-02-01T03:24:53Z

Doesn't look like a consensus has been reached. Shall we close this for now?

patricknelson · 2019-12-14T01:48:43Z

Just wanted to note here that I also just recently created an issue about this here: #9349

Keep in mind that the issue itself calls out the semantic duplication in the API without diving much into the details of how it should be implemented (albeit it's hard to untangle the two). IMHO, we should:

Point ->exists() to ->isInDB (allow them to coexist): This is because not only do they currently function nearly the same way (notwithstanding the differences noted above by @dhensby, which only adds to the confusion on the API) but to simplify the API for consistency (i.e. they do the same thing) and then clearly document why they both exist. Specifically: For DataObject's, what does it really mean to "exist"? To me, that means it is in the database. For a File (also a DataObject), it exists in the database and on the filesystem, at which point File would override ->exists() and add the extra check to validate presence of a file as well. In this way, the API retains semantic meaning and consistency for descendant classes. I do concede, however, that this duality will only have meaning in a minority of cases (largely just File and it's descendants).
Actually hit the DB when checking: Hear me out; the main reason for this is because accessing the database generally these days is an extremely fast operation, particularly if it's the same query done frequently. This is because caching can be done at the database level. However, not doing this will require caching at the framework level using global/static state which can get complicated, particularly if something happens during the current page execution that isn't properly accounted for (e.g. direct DB queries, race conditions or transactions not committed/rolled back). That is: I believe that asking "Is this in the database?" should actually tell me that it's in the database. I think the penalty for getting that wrong (especially in terms of legacy and ongoing maintenance) could be higher with mistakes made or issues in debugging than the time-penalty paid by issuing a simple SELECT ID FROM BaseTable WHERE ID=123.

FIX isInDB actually checks if the object is in DB

cce13e1

NEW Register/unregister objects to the indb cache

8d9c9a1

robbieaverill mentioned this pull request Sep 4, 2018

NEW Show CMS field (top-level) tabs in more actions dropdown silverstripe/silverstripe-elemental#373

Merged

ScopeyNZ reviewed Sep 4, 2018

View reviewed changes

sminnee requested changes Oct 15, 2018

View reviewed changes

patricknelson mentioned this pull request Dec 13, 2019

DataObject API duplication on ->exists() and ->isInDB() #9352

Closed

dhensby closed this May 16, 2020

GuySartorelli mentioned this pull request Jul 15, 2022

RFC Make DataObject ID a readonly property #10408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX isInDB actually checks if the object is in DB #7799

FIX isInDB actually checks if the object is in DB #7799

dhensby commented Jan 24, 2018

kinglozzer commented Jan 24, 2018

dhensby commented Jan 24, 2018

kinglozzer commented Jan 24, 2018

tractorcow commented Jan 24, 2018

dhensby commented Jan 25, 2018

dhensby commented Jan 25, 2018

tractorcow commented Jan 25, 2018

dhensby commented Jan 26, 2018

ScopeyNZ Sep 4, 2018

tractorcow commented Sep 5, 2018

sminnee commented Oct 12, 2018

tractorcow commented Oct 15, 2018

sminnee Oct 15, 2018

robbieaverill commented Oct 15, 2018

maxime-rainville commented Feb 1, 2019

patricknelson commented Dec 14, 2019 •

edited

Loading

FIX isInDB actually checks if the object is in DB #7799

FIX isInDB actually checks if the object is in DB #7799

Conversation

dhensby commented Jan 24, 2018

kinglozzer commented Jan 24, 2018

dhensby commented Jan 24, 2018

kinglozzer commented Jan 24, 2018

tractorcow commented Jan 24, 2018

dhensby commented Jan 25, 2018

dhensby commented Jan 25, 2018

tractorcow commented Jan 25, 2018

dhensby commented Jan 26, 2018

ScopeyNZ Sep 4, 2018

Choose a reason for hiding this comment

tractorcow commented Sep 5, 2018

sminnee commented Oct 12, 2018

tractorcow commented Oct 15, 2018

sminnee Oct 15, 2018

Choose a reason for hiding this comment

robbieaverill commented Oct 15, 2018

maxime-rainville commented Feb 1, 2019

patricknelson commented Dec 14, 2019 • edited Loading

patricknelson commented Dec 14, 2019 •

edited

Loading