Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique index / indexes are error prone #6819

Closed
sunnysideup opened this issue Apr 20, 2017 · 13 comments
Closed

unique index / indexes are error prone #6819

sunnysideup opened this issue Apr 20, 2017 · 13 comments

Comments

@sunnysideup
Copy link
Contributor

sunnysideup commented Apr 20, 2017

Summary:

Unique indexes are not such a good idea within the current Silverstripe set up (3.5.3), AFAIK, because they are prone to race conditions and other errors, making all future inserts impossible. The reason for this is that DataObject::write carries out its writing in two steps. The first step writes an empty record to obtain an ID, the second (and further writes), add the data. If the first write is performed and the second write crashes or whatever else can go wrong then the table can not be written to ever again (until the empty record is manually removed).


When creating a unique index, you are likely to encounter problems due to the way SS inserts new records. Please forgive me if I got it all wrong. I have limited time to raise this ticket and I am not coding pro, but it may be worth noting. I have already raised this on the dev group (https://groups.google.com/forum/#!topic/silverstripe-dev/n-FZ8pSsWD8)

AFAIK Records (database rows) are written,

  1. firstly with creation date and ID only.
  2. with second write for any data.

This means that no records can be written to a table while any write process is between 1 and 2. Sometimes this can be because of race-like conditions but it could also be the case that 2 does not happen because of faulty data, server failure, or whatever else you can imagine.

Here is the relevant code (from 3.5.3) in DataObject:

image

image

Line 1313 ... 1316 above executes a query:

image

call to DB::prepared_query:

image

and then to the connector:

image

The way I read this code is that writeBaseRecord actually inserts an empty record, after which further write(s) add the additional data.

Options (some of them a little crazy!):

  • make this limitation clear in documentation OR
  • add roll-back procedure to delete new entry in case of error OR
  • disallow unique indexes as they do not work so well with Silvertripe ORM OR
  • lock database row until full insert has been done (this will prevent race conditions I guess) OR
  • refactor the whole system ;-) so that we insert the data with the original insert OR
  • build a silverstripe specific way to deal with unique indexes that does not involve the DB itself
  • ...
@patricknelson
Copy link
Contributor

Or better... don't write a blank initial record in the first place. Curious: Why is this even being done? If there is no compelling reason, or if there are more issues created than it resolves, why keep it around?

Why not just write the base class along with child classes to their respective tables, along with the appropriate data, all in one insert per table? Conjecture here: If this is in order to secure consistent table ID's across child objects, can this not also be handled in a single DB transaction (not single query) and then only allow this sort of behavior as a fallback for connectors for DBMS's which don't support transactions? Assuming that is why this is done (for ID's and backward/baseline compat).

Without the code in front of me (and an hour of my time, I bet) I can't answer this question for myself, but I suspect it is being done for a reason, but I'm not certain why and it seems too magical to me. As stated above, it should at least be thoroughly documented, particularly the reasoning behind why.

@patricknelson
Copy link
Contributor

patricknelson commented Apr 24, 2017

Also @sunnysideup where you using the ::$indexes private static on your data object when you encountered this issue? https://docs.silverstripe.org/en/3/developer_guides/model/indexes/

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Apr 24, 2017

Hi @patricknelson - yes, I was using the indexes as instructed in docs.

Or better... don't write a blank initial record in the first place - that is probably the most essential few lines of framework so it is not something I would change without some consultation ;-)))))

I hear what you are saying about a single transaction. Maybe the core devs will tell me - dont waste your time with Mysql, because all of this is handled by PostGreSQL perfectly. Then it is simply a matter of updating the docs....

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Apr 24, 2017

I had a look at all the PDOConnector.php and searched for transaction and lock: no mention. I also looked at PDOConnector::preparedQuery ... I could not see any indication of transaction stuff there either.

@chillu
Copy link
Member

chillu commented Apr 27, 2017

For now, I'd start with adding a warning in the docs about wrapping any write() operations on DataObjects with unique constraints in a custom database transaction.

I'd be careful with adding a transaction around the two write calls: Depending on the context, this could lead to lock escalations (from record to page to table), and cause deadlocks. Also, not clear how nested transactions work across the different database drivers.

Ideally we'd get away with a single SQL insert/update statement. It'll be an API change, since $this->record['ID'] isn't available for augmentWrite() hooks in case of new records. Hard to say how much that's used. In 4.x, the prepareManipulationTable() call as part of the second write also relies on $this->record['ID'] being present. This might be the root cause of the double write: Being able to target manipulations by id value?

Anyway, I don't think this is high on the agenda for core devs, but worth solving by the community in a pull request :)

@lekoala
Copy link
Contributor

lekoala commented Apr 27, 2017

regarding the unique index issue, i ran into the problem as well. It doesn't affect any nullable type (you can have as many null unique column as you want) but it becomes really annoying for non nullable types (in my case : integers). Since SilverStripe doesn't provide null ints out of the box (which is really sad, because it makes it impossible to distinguish actual 0 from non initialized values, but that's another topic), I ended up storing numbers in varchars which can be null.

so maybe a way to easily prevent problems with unique constraints is to simply make sure the column can be null?

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Apr 27, 2017

Hi Ingo and Koala for your comments,

That all makes a lot of sense.

I would recommend that someone update the docs as it would save people the hassle and frustration of trying out unique indexes and then having to carefully debug and discover the above. Something like this:

We recommend not to use unique indexes out of the box, as they are likely to give more headaches than solve things due to to the two-tier write process of new data rows. This specifically applies to indexes with unique integers.

Instead, use Silverstripe specific hooks, such as onBeforeWrite, 'validateandonAfterWrite` to ensure data-integrity.

@tractorcow
Copy link
Contributor

refactor the whole system ;-) so that we insert the data with the original insert OR

This is probably the most natural solution to the problem, and the least likely to introduce regressions. This could possibly be done in a semver way, making it safe to do in 3.x. This would need to be tested for regressions against extensions which implement augmentWrite, however, so more investigation is needed. :)

If it turns out to be hard, we can do it in 4.0.

@tractorcow
Copy link
Contributor

I just ran into this in 4.1.2. ChangeSetItem has a unique index which breaks in writeBaseRecord() due to the index being initially broken due to duplicate records. There's a race condition where if you are publishing multiple items, you will temporarily end up with 0 written into the DB which need clearing up.

@dhensby
Copy link
Contributor

dhensby commented Jul 12, 2018

I'd like our IDs to change to be UUIDs so that we don't rely on the DB to assign IDs for us. But this would be a rather large change that would be for 5.x

@tractorcow
Copy link
Contributor

I'm thinking just writing the full base table row up front would probably be a better default than leaving them to db default :P

tractorcow added a commit to tractorcow/silverstripe-framework that referenced this issue Feb 27, 2019
tractorcow added a commit to tractorcow/silverstripe-framework that referenced this issue Feb 27, 2019
@tractorcow
Copy link
Contributor

Finally fixed this with #8831

tractorcow added a commit to tractorcow/silverstripe-framework that referenced this issue Feb 27, 2019
@tractorcow
Copy link
Contributor

Fixed with #8831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants