-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus-db-queries] make saga_update more resilient, record_event idempotent #6113
[nexus-db-queries] make saga_update more resilient, record_event idempotent #6113
Conversation
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I had a couple tiny suggestions, but I don't see any issue with the change overall.
// Consider the situation where a saga event gets recorded and | ||
// committed, but there's a network reset which makes the client | ||
// (us) believe that the event wasn't recorded. If we retry the | ||
// event, we want to not fail with a conflict. | ||
.on_conflict((dsl::saga_id, dsl::node_id, dsl::event_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this makes sense to me --- I appreciate the comment!
nexus/db-queries/src/db/sec_store.rs
Outdated
"call_count" => call_count, | ||
"total_duration" => ?total_duration, | ||
); | ||
} else if total_duration > Duration::from_secs(20) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicky, unimportant: maybe make the duration a constant?
// About conflicts | ||
// --------------- | ||
// | ||
// This function is meant to be called in a loop, so that in the event | ||
// of network flakiness, the operation is retried until successful. | ||
// Currently, if the value of `saga_state` in the database is the same | ||
// as the value we're trying to set it to, the update will be a no-op. | ||
// That is okay, because at any time only one SEC will update the saga. | ||
// (For now, we're implementing saga adoption only in cases where the | ||
// original SEC/Nexus has been expunged.) | ||
// | ||
// However, in the future, it may be possible for multiple SECs to try | ||
// and update the same saga, and overwrite each other's state. For | ||
// example, one SEC might try and update the state to Running while the | ||
// other one updates it to Done. That case would have to be carefully | ||
// considered and tested here, probably using the (currently unused) | ||
// `current_adopt_generation` field to enable optimistic locking. | ||
// | ||
// To reiterate, we are *not* considering the case where several SECs | ||
// try to update the same saga. That will be a future enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take it or leave it: this kind of feels like it ought to be API documentation on the method, rather than an internal implementation comment? IIUC it is describing what the caller is expected to do with the function, which feels like API docs to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
Created using spr 1.3.6-beta.1
Thanks! I'll let one of Sean and Dave also review this just to be sure we've gotten everything we can think of here. |
FWIW, it might be worth seeking out a review from someone more qualified with this stuff than I am; I don't actually know this code at all --- I'm not totally sure why #6090 was assigned to me in the first place, it might've been a mistake? |
Ah, yup, looks like you replied while I was writing my prior comment. Waiting for a Sean or Dave review seems like a good idea! |
Created using spr 1.3.6-beta.1
See discussion in #2416 and #6090 (comment).
A summary of the changes here:
saga_create_event
idempotent. Previously, creating another event that duplicated the one which already existed would fail -- now it succeeds. These events are meant to be an append-only idempotent log, so this is okay. Also added a test for this.saga_update_state
was already idempotent -- added a test which made sure of this. Also added a comment about how idempotence may not be enough in the future.