Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Persist on event may persist events again in case of disaster recovery #385

Closed
volkerstampa opened this issue Mar 20, 2017 · 3 comments
Closed
Assignees
Milestone

Comments

@volkerstampa
Copy link
Contributor

Eventuate support event driven communication through the PersistOnEvent trait. Its implementation is based on the ideas of reliable delivery and essentially it uses the sequence number of the currently handled event as delivery id. However the sequence number of events might not be stable in case of disaster recovery.

Imagine for example two locations A and B replicating all events. A emits E1 (in A seqNo 1), B emits E2 (in B seqNo 1). Once B receives E1 (in B seqNo 2) it emits (persistOnEvent) E3 with persistOnEventSequenceNr = 2. B's log contains now: E2 E1 E3 If B is restarted, replaying E1 leads to another persistOnEvent request with persistOnEventSequenceNr = 2 that is confirmed by the replayed E3 (and thus E3 is not emitted again).

In case of a disaster on B, B recovers events from A. However in A's log events are in a different order and disaster recovery for B ends with the log E1 E2 E3. When E1 is replayed this time a persistOnEvent with persistOnEventSequenceNr = 1 is requested and this request is not confirmed by the replayed E3 (as it just confirms a persistOnEvent request with persistOnEventSequenceNr = 2) and thus E3 is emitted again (as E4). Note that the next replay will not emit E3 again as its persistOnEvent request is confirmed by E4.

@volkerstampa
Copy link
Contributor Author

A possible fix for this bug is to use the combination emitter-id and emitter sequence number as delivery id and store this in a dedicated field of a DurableEvent. So in the example above E1 would be emitted by A with this field set to A-1 and a persistOnEvent request by B would use this as delivery id instead of the local sequence number.

@krasserm
Copy link
Contributor

This fix sounds reasonable to me as it uses a unique event identifier and not the non-unique local sequence number for referencing events that caused a PersistOnEvent request.

Instead of using the emitterId it is sufficient to use the processId of the event which is also required for extracting the initial local sequence number (i.e. emitter sequence number) from the vectorTimestamp. An additional method on DurableEvent should be added to generate this identifier (or even replace the existing id method as it is not used within Eventuate).

We still need to make sure to preserve the proper emission order of PersistOnEvent requests which is not given any more by the new identifier. Instead of the current SortedMap[Long, PersistOnEventRequest] we could use a plain Seq[PersistOnEventRequest] but this would increase confirmation complexity from O(log n) to O(n) which is not acceptable.

We therefore need to store the new identifier in addition to the currently used persistOnEventSequenceNr and introduce a second mapping Map[A, PersistOnEventRequest] (which allows us to make a lookup and removal in the current SortedMap[Long, PersistOnEventRequest] via the persistOnEventSequenceNr stored in PersistOnEventRequest) with complexity O(log n).

The new identifier must be stored in a new DurableEvent field named persistOnEventReference: Option[A]. We cannot reuse deliveryId because it would prevent us from using events for ConfirmedDelivery that have been generated via persistOnEvent.

For backwards compatibility we have to deal with the following two situations:

  1. confirmation events have set both persistOnEventSequenceNr and persistOnEventReference which allows removal of mapping entries with acceptable complexity i.e O(log n).
  2. confirmation events have only set persistOnEventSequenceNr which requires a removal from Map[A, PersistOnEventRequest] with complexity O(n) for a limited number of outstanding PersistOnEvent requests.

@volkerstampa
Copy link
Contributor Author

Sounds good to me. A could either be a dedicated type (like case class EventId(processId: String, sequenceNr: Long) or a simple String where processId and sequenceNr are concatenated. The dedicated type might provide better type safety, however the string concatenation pattern is already used in other places in eventuate (e.g. for building a log-id).

volkerstampa added a commit that referenced this issue Mar 29, 2017
Instead of using the in case of disaster recovery potentially
unstable sequence number as id for persist on event requests use a
stable EventId that is composed of the sequence number of the emitter
of the event and the corresponding process id.

Closes #385
@volkerstampa volkerstampa self-assigned this Mar 29, 2017
@volkerstampa volkerstampa added this to the 0.9 milestone Mar 29, 2017
volkerstampa added a commit that referenced this issue Mar 30, 2017
Instead of using the in case of disaster recovery potentially
unstable sequence number as id for persist on event requests use a
stable EventId that is composed of the sequence number of the emitter
of the event and the corresponding process id.

Closes #385
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants