-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - DataObject ID Decoupling #8411
Comments
I like the idea for a few reasons:
One concern would be that SilverStripe URLs will become much longer and less human friendly |
Well, an integer is almost as meaningless to a human as a UUID, but URLs in the CMS will become longer, but that's an aesthetics issue and UUIDs are not there for aesthetics, they are there for uniqueness. I have also realised that we can't actually write an incrementing ID generator in the application layer because we would still need to write a row to the database otherwise we'd always generate the same ID on multiple instantiations. The solution would either be to write the object on construct (a massive overhead) or not support auto-incrementing IDs. This will have to be SS5 change |
Yeah - I guess in the frontend you wouldn't be encouraging the use of IDs in URLs anyway, e.g. blog and CMS pages etc which uses a URL segment |
Your ID generator could simply be a table that stores an auto-incrementing ID, hypothetically, if the user really wanted to force integers. |
In PostgreSQL you don't even need the table, just the sequence 🙂 . |
If I may add 3 little things:
ps: if need be, I've already a working uuid extension for silverstripe 4 here https://github.com/lekoala/silverstripe-uuid |
I can see that UUIDs are useful in some circumstances, and totally support us providing support for them. However, I dispute whether auto-incrementing IDs are legacy. I would still see auto-incrementing IDs as an effective tool in most cases that don't include a multi-master database.
We have to be extremely careful here that we don't introduce race conditions with this. For example, generating an auto-incrementing ID on the PHP side would be extremely bad practise. Generating a UUID on the PHP side is okay because the chance of collision is very small. Other random identifiers might be acceptably generated on the PHP side, although you would probably need to have a special "ID collision" error case. So the whole first-write-gets-ID approach will stay with us regardless, unless we drop numeric IDs altogether. You could arguably have the sequence generated separately from the initial record, although in simpler cases – i.e. where you are simply writing a record with no related data – this will slow things down, as you'll now have a separate sequence write and record write. Perhaps these can be optimised?
I think this is a very small part of such an abstraction and if this is really a goal I would suggest we design that first – a shift from Active Record to more of a Data Mapper. I'm moderately supportive of this: I think it's a great idea I just don't know if it's the most important Big Refactor to be doing. I would support anything that dramatically speeds up GraphQL, tree fetches, or permission checks; maybe it overlaps with that. ;-) |
This is pretty straightforward; it can be a boolean field that is:
The hardest thing is that currently we don't clearly distinguish a construction-from-db-content, although that's only hard because it requires a small API change. |
@sminnee Whilst I appreciate what you're saying about numeric IDs (they are effective for non-distributed DBs) the point is that our Model is still (needlessly) coupled to the DB layer in this instance. It also means we continue to have the other problems of guessable IDs (which could prove to be a security vulnerability for applications because providing sequential numeric IDs means it's very easy to determine which IDs do (or have) existed in the past if you know a current valid ID). I also would not like to support both IDs that can be generated in either the DB side or the application side because this means we'll run a different DB schema depending on the setting of the application (we'd have to switch the column type and the AUTO_INCREMENT setting on the ID column), it would also make it impractical to scale a site (ie: move it from single server to distributed server infrastructure) as it'd require a complete remapping of IDs on every single table. |
OK, so in effect we'd drop simple numeric ID generation altogether, and probably shift ID storage to If the goal is decoupling the model from the database, then I would be inclined to do that in a more complete manner with a datamapper style abstraction, instead of the current active record. This would allow for both UUID-based and AUTOINCREMENT-based repositories, if we wanted, as this responsibility would lie within the mapper. It would also make DataObject more of a convenience for predefining persistence metadata rather than a base class of everything that can be persisted – it would be easier to persist Plain Old PHP Objects if needs be. If the goal is to allow for non-incrementing IDs for better multi-master support and security, then I would be more inclined to go with an implementation that supported both auto-increment (MySQL) / sequence (PGSQL) based IDs and PHP-generated UUIDs, and probably by default generating the UUID on first write as we do for auto-increment IDs, to leave that semantics more closely matching the current implementation. A method like |
Yes, I'd like to know too; though it may be one of those things that only really bothers / affects those that have hit its limitations. I'm not sure how much developers really notice what ID an object has; how often are IDs hard coded into logic and (if they are) is that a good practice and justification for making them "readable". As for moving to a complete abstraction with a data-mapper, yes, that would be good, but what I'm looking to do is, over the course of a decent period of time, lay the groundwork that is required for that rather than trying to do it all in one go. There seems to be a little bit of incremental working going into the framework that is using #6632 as it's motivation. I recently outlined a few "blockers" that would stop us even getting onto that (one of which is a datamapper/proper abstraction layer between our objects and the persistence layer). tl;dr I'm out to make incremental improvements to a higher end goal over a single massive change that will ship all as one. Part of that is because of time constraints but part of it is also because one huge PR like that is very difficult to keep up to date with the velocity that the framework has at the moment. As long as each individual change is an improvement then that should be fine. |
PS: I'm trying to get sign off for the individual areas I'll need to address to make #6632 a reality, a datamapper is a step change ahead of where I'd be starting when laying the groundwork for it |
That being the case I would probably do this, which should be
This will basically create the situation where we can support UUIDs without dropping auto-increments, in SS4, which will be a good way of determining how feasible it is to go UUID-only in the future. |
A conversation around this has just developed in silverstripe/silverstripe-elemental#429 - in the context of writing (assumed) IDs into unit tests, in the manner that we know how the system works and have control over the data per test case. This works well, although does litter the unit test with magic numbers a little. The case was brought up that assuming IDs is bad in that if the ID were to ever change (e.g. this RFC), that would mean there is needless refactoring within the test. My response was as follows:
The crux of it is that changing to UUID (strings) would be far more burdensome than simply causing some fast and fancy-free coded tests to fail. IDs are intended to be an abstraction, and for the most part are throughout the application suite. However there are a number of cases where they're not... strongly typed GraphQL is one that immediately springs to mind... and I'm not sure I'd label the code within as "technical debt" in that it's not poor code as a assumptive test is - although it would immediately become such upon a conversion to UUID. Not to say it can't be done (I support it), but there is a much larger consideration to be had than simply the coupling of the model and storage layers (and the storage layer particularly to MySQL over others). |
Some specific ideas about implementation (for discussion / refutation):
I'm not sure what methods the IdProvider should expose; I would suggest that this is figured out by the people implementing AutoIncrementIdProvider and UuidIdProvider, ensuring that it covers the needs of both these providers. |
@silverstripe/core-team esp @dhensby what do you think of my suggestions in the last couple of comments? |
I like the idea of an IdProvider, this could help abstract ID handling cleanly. GraphQL has a specific |
Good reminder about the front-end impact of any such change, Chris :-) 👍 |
Sounds pretty good to me! I think it’d be worth having a mini vote on the class and namespace naming though, it has the potential so have a lot of Providers in it |
Yeah potentially there's an IdProvider sub-namespace? One thing I wondered is whether the interface should be in the parent namespace, though, mainly for brevity, but also because I wince every time I write |
Thanks for all the thoughts and feedback, especially with regards to frontend components that could be making assumptions about ID types. It's not clear to me how our application could keep track of an auto-incrementing ID counter across process threads unless we had some kind of dedicated persistence storage for the ID increment-er, which is really gross and undermines the objective of being able to run SS projects on distributed hardware. I'm wondering if this is an area where we should just be opinionated and provide UUIDs without an ability to easily override it (just like we are currently opinionated) but being opinionated about UUIDs just means we are able to decouple our ID generation from the persistence layer |
I think there's some confusion here: the AutoIncrementIdProvider should behave as close to the current behaviour as possible, should be the default, and these two facts mean that this can be introduced in 4.x. Specifically, I'd imagine something like:
Conversely:
It's probable that the final code pushes more responsibility to relevant DBField implementations, so we'd have a To respond to your comments:
I disagree with this for a number of reasons:
We've got a tried & true solution for this: autoincrementing columns. No need to change this.
SilverStripe has run for more than a decade without UUIDs, so any argument that continuing to support the current behaviour is going to be impractical is on shaky ground. My expectation would be that those teams running a distributed database would be strong recommended to use the UuidIdProvider, as would anyone who doesn't want to carry the pre-emptive write of the AutoIncrementIdProvider. UuidIdProvider would therefore have advantages, as well as the disadvantages of longer, less meainingful IDs. By the time we're releasing SS5, we can get some real data about how popular the UUID provider has been and what kind of issues people have run into making use of it. Then we can decide whether to:
|
I strongly support the ability to have both if we have to do anything, with UUIDs being opt-in (at least for now). UUIDs definitely have a place, but I don't think the majority of SilverStripe sites would be helped much by them, and they can make things more complicated too (for example it basically becomes impossible to look at the database directly and scan for new records or trace one DB write across multiple tables - IDs are memorable, UUIDs aren't). @sminnee: In your proposal above I'm not sure how you see $has_one relationships being handled - would you expect separate columns e.g. |
I'm with Sam here: I don't think it's a very important refactor, because the group of devs getting excited about this would be very small - teams which run large write heavy multi-master setups who are currently experiencing performance or locking issues. If it's a required step on a DBAL refactor, and on the path to reinventing less wheels in SilverStripe over time, that's more interesting ;) I'm a bit concerned on making this behaviour type specific, and allowing different approaches in the same database. While I agree that the data mapper pattern is a great way to avoid one-size-fits-all databases, in terms of identifiers the types aren't isolated. Types link to other types via foreign keys, and I'd find it weird to have a numeric primary key and a UUID foreign key in the same row? What's the migration path for this? At minimum, it requires updating every row in the database, and relying on a somewhat shaky ORM level referential integrity on identifier rewrites (foreign keys etc). Then there's shortcodes, and a million other cases where identifiers are coded into different stores - for examples things like https://github.com/phptek/silverstripe-jsontext which will need special treatment. It's all solveable, but we should be realistic about work vs. payoff here. |
Maybe there is no need to replace IDs by UUIDs, but simply have both? this way, migration is easy, and you can still have pre determined ID's and all other benefits. Also performance might be a good point : using plain int is always going to be faster than UUIDs. |
Potentially stating the obvious after reading everything above, but:
This may also be a good place to mention that That said, a quick point on that API nomenclature: It's also worth noting that we should probably disambiguate the meaning of Maybe this could mean refactoring |
FYI: Added separate issue (#9352) to address my concern from the previous comment above regarding the redundancy of |
Proposal to remove dependency on auto-incrementing Database driven IDs
Motivations
Inserting and pulling out auto-incrementing IDs can cause race conditions when using distributed architecture, especially with distributed databases
At the moment, to get an ID you have to write the item to the DB first, this is bad practice as all Models should have an ID at all times; this can also allow us to simplify "unsaved relations" because we'll be able to set a relation on an ID that's not been written yet, allowing us to depend on eventual consistency.
Incrementing IDs are guessable and easy to determine if an ID will exist. As IDs are often used in URLs, they should not be easily guessable
We decouple from DB integrations and allow people to define their own ID scheme; our Model shouldn't depend upon the database to work
Impact
is_numeric
logic from ID validation / make ID validation a bit more abstractisInDB()
/exists()
methods to not just assume a model exists if it's got an ID (either we need to ping the DB or we need to set a flag on loading from DB / writing) - Perhaps replaceisInDB()
withisPersisted()
?Approach
This would have the responsibility of generating and validation IDs so that this is abstracted away and we no longer need to do
is_numeric($this->ID)
to validate IDs.By default we'd have a UUID generator but we could also have a "legacy" generator. This could be a hybrid that accepts numeric IDs and UUIDs
or just a numeric auto-incrementing ID generator.The text was updated successfully, but these errors were encountered: