-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monotonic journals #13936
Monotonic journals #13936
Conversation
Only concern here is making sure we emit a matching ProjectEvent anywhere we are removing a JournalEntry. It looks like that's the case everywhere but the admin view (which is a past oversight). |
I can emit the event in the admin view to fix that oversight! |
request.db.add( | ||
JournalEntry( | ||
name=f"user:{user.username}", | ||
action="nuke user", | ||
submitted_by=request.user, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably doesn't make sense to add an event to replace this: since the user is getting nuked, there's nothing to attach the event to.
Did this in 308ba3a! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the cleanup, especially for items that aren't super journal-oriented like adding owners and roles.
(I completely acknowledge that I've used journals as the audit trail for packages that have been removed and re-uploaded, since we nuke Events, but that's a different problem, and not part of journal's responsibility.)
I had one implementation question inline.
Another question I think we should answer is how might we do lock monitoring, since I don't think we do that today. If memory serves, a Datadog Postgres monitor could emit those metrics, but I don't think we have that Agent configuration set today.
636b1a1
to
7a65d3d
Compare
I've updated this pull request to use a @di had previously added the events that @ewdurbin had asked for to ensure we're not losing auditability. I think this should be good for review now. |
This reverts commit 37fd693.
I've also reverted most of the places where we stopped emitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! I had one non-blocking comment.
I'm still curious how this will operate at scale, and how we are likely to determine if we're getting lock contention. Is that something you've considered already?
If it becomes an issue the latency of anything that adds journal entries would increase. It should be pretty obvious because the query that takes the lock will end up taking a long time, so should show up as a slow query. Because we're using It does by it's nature serialize those transactions, which is the goal because the fact they're not serialized is what is causing the underlying bug (which I've seen happen about 8-10 times in the last 2 weeks). |
Our mirroring support currently assumes that the
journals.id
field is a monotonically increasing integer.However, the way that
SERIAL
is implemented in PostgreSQL, this isn't actually true. The fetching of the next value from aSERIAL
ignores the transaction. In addition, we have no control over what order the transactions will commit in, this means something like:TXN A does an
INSERT
, getsjournals.id = 100
.TXN B does an
INSERT
, getsjournals.id = 101
.TXN B
COMMIT
, makes last serial = 101.Bandersnatch makes a request, and notes that the last serial is 101.
TXN A
COMMIT
, the last serial is still 101.Bandersnatch never sees 100.
What this does is wherever we create a
JournalEntry
, we first take an advisory lock (scoped to the transaction). This will have the effect of forcing journal entry insertion to be serialized, without affecting our table locking otherwise.The downside to this is it can cause scalability concerns, but AFAIK it's the only real way we have to make sure that our last serial is monotonically increasing.
To try and reduce the negative effects, we also go through and stop emitting any journal event that our mirroring support doesn't mean. Specifically this is any journal event that isn't correlated with a change in the output of the
/simple/$name/
page.Each "type" of journal event that we stop emitting is in it's own commit, so if we disagree with any of them, it should be easy to revert.
This will fix (hopefully) #4892, and #12933.
This will move us closer to #11918.