-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support idempotent writes #26
Comments
I think there's an issue using the ID to handle idempotency... Say we have a message handler subject to "at least once delivery" if we receive the same message twice then the processing will run twice, but typically we would assign a new guid to any resultant events. This would mean the ids always differ. |
That's why we'd have to use deterministic GUIDs for this to work. As you say doing Guid.NewGuid() is always going to generate a new value. Check this Stack Overflow article for an example: You end up creating a GUID based on a string hash - that way you can get predictable results each time an input command/even is processed. |
We would definitely look at using this if it was added, but the toggle on/off config would be essential. I'm a little worried about:
Regardless of the above ... this next idea looks a little naff in storage, but is it worth making the event id a string not a guid and allowing the consumer to build up a string that is easier to debug, e.g. "{EventName}" might be a default implementation, or "{EventName}-{DateTime}-{Amount}" might represent some sort of payment event. I'm assuming that event-id only needs to be unique within a stream, not across streams. It would remove the guid determinism and collision concerns and it would be easier to debug and support multiple versions. It makes it obvious what's going on. On the other hand, it looks horrible in storage and it duplicates data found elsewhere on the event. It couldn't be compressed if the storage engine is responsible for the idempotency check. Alternatively, I'm not sure if the boilerplate code issue could be minimised avoiding all of the above with attributes rather than as an event-store concern, e.g.
that still needs "boilerplate" code ... but actually is very easy to read, to see what's happening, to manage event versions and it can provide the idempotency check before the method executes. The elements between braces are properties on PaymentRequested. The above might be implemented using reflective code in a library which wouldn't look that pretty, but the API to the consumer seems attractive and simple? The business logic is easy to read? Or it could be implemented using post-processing and Roslyn code generation (but I'd consider that road carefully before taking it). |
Actually this would be better, where PaymentRequested implements an interface returning a string EventId ... but that means all persisted events would need to implement an interface from a library, which sucks - at the moment all ours are POCO):
|
@asosMikeGore I think I'd be worried about GUID collisions if we were using a deterministic guid as they are no longer cryptographic ally random, correct? Do you have any testing to show that if we were to do this with our throughput we would not see any collisions? It seems like a collision on this ID would be very bad? |
@DaveAurionix @PeterStephenson
|
Thank you Mike. I think the event-version question is relevant. "Same GUID" needs defining and that creates the issue. If an event (say V1 for ease) is represented by a certain deterministic guid, and later a new property is introduced (V2 of the same event, conceptually but with versionless events it's more of a label) then people would need to think carefully about how that new property affects idempotency. If its value is included in the definition of that specific business event then the algorithm used to generate the deterministic guid suddenly needs very careful attention so that it generates the same value for older events without that new property whilst at the same time generating a guid that takes into account the new property for newer events. If it broke backwards compatibility for older events then the idempotency check in the storage layer would fail. A stream with the V1 event, but consumed by a software version where handlers now use a newer guid generation algorithm, would not flag up a V2 as being a duplicate, because the generated guid would be different. Worse - if they received another "V1" deserialised as a V2 (because: versionless), they still might not flag it as duplicate if the default value for that field doesn't contribute to producing the same guid value as the V1 algorithm did without the property factored in at all. Unit testing and very careful thinking would I'm sure come up with a solution - but I think that even the need to think so carefully about determinism is non-obvious, let alone the unit test cases and subsequent implementation required. Underpinning the above is the assumption that an idempotency check on some events may need to examine more than just the event name. If we're able to operate solely in the simpler problem space then I agree wholeheartedly that the guid determinism complexity goes away / is irrelevant. Doing away with the initial read sounds powerful, I very much like the sound of that. That said, I struggled to find many examples where we could. I'm certain that it's my lack of imagination, but it seems that it would only be possible in handlers that needed no input besides the triggering event, plus perhaps used web api GETs (which we discourage in our micro-services) or that accessed an internal database (which negates the benefit of removing the read, and the event store would typically be that source of data anyway), plus were side-effect free ... so in my mind I arrive at a place where skipping the read and instead checking-on-save is a combination that is only possible for simple side-effect-free single-event-in handlers that only publish one or more events out using no additional data (or perhaps just GET from a 3rd party). What am I missing here? There's an elephant and I can't see it even when it's treading on my foot. To me, the storage-layer idempotency check sounds great and I want to find ways to use it more, but it would initially be a "nice-to-have" to create new options for us when designing solutions. I doubt we'd launch into using it all over the place, but again that might just be my initial lack of imagination. Would be very interested to see applications that use it well. |
When writing streams, we currently do no checks to determine if an event already exists in a stream. This means the application logic has to implement this functionality which generally ends up as lots of boilerplate code.
The way we can improve this is to allow writes to have an idempotent check to determine if the write has already occurred and ignore it if it has. The current thinking for how to do this is
Clients would have to use deterministic GUIDs when appending events to the stream, otherwise the check would never work.
The text was updated successfully, but these errors were encountered: