-
Notifications
You must be signed in to change notification settings - Fork 92
OO conflict resolution
- Need to think about this some more.
-
It would be really great to figure out how to deal with conflicts involving multiple objects, most ntably BTree bucket splits. When I started drafting this, My plan was to simply move conflict-resolution to the client, but when thinking about the new API, I realized that I needed to give more thought about the way this might work at the object level.
- Want to deal with objects not state
- Get rid of stupid persistent references.
- But this opens up the inconsistency at the root of conflict resolution. When you resolve conflicts, you mix data from subsequent transactions. This is more acute when there are multiple objects involved.
This needs more thought.
Conflict resolution in ZODB is rather archaic. It was added a long time ago and hasn't changed. It has a number of problems.
-
It's inflexible. Conflict resolution is defined in classes making it difficult for applications to customize conflict resolution for existing classes.
This is important because:
- Conflict resolution weakends consistency. Some applicatuions may want to trade off performance for greater consistency by making conflict resolution more conservative, or by disabling it altogether.
- Conversely, some applications may want to make conflict resolution less conservative in some cases.
One might even want to apply conflict-resolution rules based on object state.
-
You can only apply conflict resolution to one object at a time. In some cases, such as BTree bucket splits, you may need to adjust multiple objects to resolve a conflict.
Worse, conflict resolution code works on state, not objects.
-
The API is awkward. The conflict-resolution API was designed before there were class methods. Conflict resolution is defined using an instance method, but the instance passed to the method isn't used.
In retrospect, it would have been much saner to use some sort of component registry.
-
Conflict resolution is a storage responsibility, but it doesn't really have anything to do with storage. Innovation in conflict-resolution strategies shouldn't be tied to a particular storage implementation.
-
ZEO applies conflict resolution on the database server. This presents deployment challenges, as the conflict-resolution code must be importable by the server, coupling the server and application deployments.
I propose to support conflict resolution in ZODB itself.
-
When configuring a database, you'll be able to specify a custom conflict-resolution callable. This callable will be responsible for resolving conflicts in individual objects. The default component will implement the existing strategy.
Resolvers will be called with a conflict object. This object will have attributes:
- object
-
The object who's changes conflict. It's current state will be the new conflicting state.
- old_state
-
The state the conflicting state was based on. Conflicts may be resolved multiple times for an object in a transaction. The first time, the old state is the state at the beginning of the transaction. In subsequent calls, the old state is the committed state from the previous call.
- new_state
-
The new conflicting state of the object. This is the state that was sent to the database. It should be equal to the result of calling
__getstate__
on the object. - committed_state
-
The state that was committed/current in the database at the time of the conflict check.
- factory(state)
-
A callable that can be used to convert a state to an object, for cases where it might be easier to work with objects than with data.
The states will be objects, not pickles. These objects may refer to other objects. If they refer to
The job of a resolver is to merge changes made between the old and new states with those between the old and committed states. The merged state should be either returned or applied to the object argument.
-
old_state, committed_state, and an object factory. The object argument is the object with the conflict. The old_state argument is the data read by the transaction (or by a previous conflict resolution). The committed_state argument is the state currently committed for the object. The state objects will be Python objects, not pickles.
-
There will be a new storage method:
tpc_resolver(transaction, callback)
If defined, then it will be called with a callback to be called in the first phase of 2-phase commit. The tpc_resolver method will be called after calling tpc_begin and before any store calls.
The resolver callback may be called any time up to the response from a tpc_vote call. It will be passed: oid, committed_serial, commited_state, old_state, and new_state, where new_state is the state passed to store, committed state is the current state in the database, committed_serial is the committed serial in the database for the object, and old_state is the previous state passed to the previous store call. If the conflict is resolved, the committed_serial will be passed to store along with the new state.
If the resolver can resolve a conflict, then additional store calls will be made with updated state. These will be for the conflicting objects, but may also be made for additional objects modified during conflict resolution.
-
If a storage implements tpc_resolver, its store method may be called multiple times for the same object as a result of conflict resolution. It may be called for objects for which there were no conflicts, if, in resolving conflicts, it's necessary to modify (or further modify) non-conflicting objects. The storage should use the last state it has received for an object.
-
A storage that implements tpc_resolver should keep track of objects with unresolved conflicts. If tpc_vote is called when there are unresolved conflicts, it will raise an UnresolvedConflicts exception. This exception doesn't end the transaction. The tpc_vote may be called again for the transaction after outstanding conflicts have been resolved.
- A storage may check for conflicts multiple times. In particular, when it receives a store call, it may check for a conflict at that time and call the resolver if there is one. It will check again later, in the tpc_vote call. If it detects a conflict during tpc_vote, it should release any commit locks, call the resolver, and raise UnresolvedConflicts.
- As a result of the point above, changes to an object may be resolved multiple times during a single commit. If conflicts are resolved more than once, then after the first, the old state is the committed state from the previous resolution.
This is a sketch. I plan to prototype it with an updated (possibly experimental) version of ZEO.
Any comments or suggestions are welcome. Jim