-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX isInDB actually checks if the object is in DB #7799
Conversation
This seems far less performant than the existing check, is there a bug this fixes? |
Clearly there's an impact in checking the data source for whether an object exists - however the assumption that "if a DO has an ID it exists in the DB" is deeply flawed and (unfortunately) one that is prevalent amongst our code which I'd like to slowly work on removing |
Hypothetical example: dataobjects which may or may not have an image attached.
This change is going to result in one extra database query for each dataobject that has an image, right? I’m a little hesitant about this one, as far as I’m aware we’ve not seen any real-world issues caused by this behaviour. How about meeting halfway - leave |
I'm extremely nervous about this change, as we use isInDB() in many many places, and there is going to be a serious performance cost here. The "is actually in the DB" Is meant to be maintained by deleting the object having it's ID reset. The extra API (which ignores things such a stage, extra tables, etc) feels awkward to have to remember to invoke when doing low level DB methods. E.g. using SQLDelete() to directly delete rows won't automatically update. I would vote to just close this I'm afraid. If you can show an example of where we absolutely need a real DB check, I would probably recommend a solution based on that specific example instead. |
It hasn't introduced a significant impact from my limited testing. I've found that a default page load on a vanilla install doesn't invoke the SQL query once. I'm not doubting that checking the DB for existence of a record will introduce an overhead, but to argue that we should just avoid checking the DB when claiming to check the DB on performance grounds seems misguided to me. What other parts of the framework should we stop actually doing what it says because it's faster to "guess"?
We do nothing to enforce that the ID can't be set by user code, so it's not in any way reliable way of determining if the object exists.
Using low-level APIs will always be awkward in terms of clearing cache stores. At the moment if you do
TBH, if the answer is "don't use our methods that claim to check the DB because they don't do what they say" then that's an argument for deprecating these methods |
I think I'm going to propose an RFC on this because there is clear flaws to assuming that an ID != 0 means presence in the DB (especially as anything can set the value of the ID field) and there are further changes I want to make that all come under a similar umbrella. This may also be too radical for a 4.x release and instead focused on 5. But there is clearly a mismatch in expectations here -
The $page = new Page();
$page->ID = "Oh looks, this is an ID";
$page->isInDB(); // false
$page->exists(); // true
$page->ID = 9999;
$page->isInDB(); // true
$page->exists(); // true |
We should ensure that any ORM method for selecting / deleting behaved consistently. We can't protect against raw SQL queries, which is why we don't recommend them if possible. |
exactly ;) |
/** | ||
* @param DataObject $object | ||
*/ | ||
public function unregister_object($object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can(should?) be marked as static too?
How about |
There's definitely still a place for a method that checks "has this DataObject been written" without actually querying the database. I agree that checking whether ID is set is weak (your other RFC covers why) but we can still remember that the record has been written and return true for the remainder of the PHP request. We could change this to |
->exists() will return false for any archived items; Which may seem logical at first, but if you are doing a loop over versioned records in a template, for example, that'll suddenly make that loop empty.
|
} | ||
$class = $this->baseClass(); | ||
if (!isset(self::$_cache_in_db[$class][$this->ID])) { | ||
$sqlSelect = new SQLSelect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way that you can get to this code is if you have manually set the ID and not yet written the record. So, this code is picking up the case when you've manually set the ID to an existing record.
We have the option of simply returning false in this case. If we did this, the functionality would match the "has been written" logic that I've recommended. It would remove the cause for performance / logic concerns that others have had.
My view would be that checking the database for the presence of an ID that has been manually applied is a narrow & different use-case from the current uses of isInDB()
. In many cases where this happened it's not so much "this record is in the database" as "I am about to corrupt my database". Code responsible for ID collision would need some careful and separate thought.
So I think that should simply return false in this case, and the update the docblock of isInDB()
to
Returns true if this record was read from the database or has been written to it.
Note that it does not query the database for the presence of a manually-applied ID.
Note also that, after this change, the static registry can probably be replaced with an instance-specific private boolean $isInDb
variable.
That seems OK, but I think that isInDB() would ideally return true for archived objects. Currently it behaves the same way as exists(). I had an issue recently with unexpected behaviour on archived items because ->isInDB() was false. It is in the database though... Thoughts? |
Doesn't look like a consensus has been reached. Shall we close this for now? |
Just wanted to note here that I also just recently created an issue about this here: #9349 Keep in mind that the issue itself calls out the semantic duplication in the API without diving much into the details of how it should be implemented (albeit it's hard to untangle the two). IMHO, we should:
|
Both
DataObject::exists()
andDataObject::isInDB()
claim to check if the current object is in the DB yet neither do so and both provide a slightly different way of determining if the object does in fact "exist in the DB".This change introduces checking of the DB when checking the DB for an object's existence.