Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monotonic journals #13936

Merged
merged 19 commits into from
May 30, 2024
Merged

Monotonic journals #13936

merged 19 commits into from
May 30, 2024

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Jun 14, 2023

Our mirroring support currently assumes that the journals.id field is a monotonically increasing integer.

However, the way that SERIAL is implemented in PostgreSQL, this isn't actually true. The fetching of the next value from a SERIAL ignores the transaction. In addition, we have no control over what order the transactions will commit in, this means something like:

TXN A does an INSERT, gets journals.id = 100.
TXN B does an INSERT, gets journals.id = 101.
TXN B COMMIT, makes last serial = 101.
Bandersnatch makes a request, and notes that the last serial is 101.
TXN A COMMIT, the last serial is still 101.
Bandersnatch never sees 100.

What this does is wherever we create a JournalEntry, we first take an advisory lock (scoped to the transaction). This will have the effect of forcing journal entry insertion to be serialized, without affecting our table locking otherwise.

The downside to this is it can cause scalability concerns, but AFAIK it's the only real way we have to make sure that our last serial is monotonically increasing.

To try and reduce the negative effects, we also go through and stop emitting any journal event that our mirroring support doesn't mean. Specifically this is any journal event that isn't correlated with a change in the output of the /simple/$name/ page.

Each "type" of journal event that we stop emitting is in it's own commit, so if we disagree with any of them, it should be easy to revert.

This will fix (hopefully) #4892, and #12933.

This will move us closer to #11918.

@ewdurbin
Copy link
Member

Only concern here is making sure we emit a matching ProjectEvent anywhere we are removing a JournalEntry.

It looks like that's the case everywhere but the admin view (which is a past oversight).

@dstufft
Copy link
Member Author

dstufft commented Jun 14, 2023

I can emit the event in the admin view to fix that oversight!

Comment on lines -236 to -250
request.db.add(
JournalEntry(
name=f"user:{user.username}",
action="nuke user",
submitted_by=request.user,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably doesn't make sense to add an event to replace this: since the user is getting nuked, there's nothing to attach the event to.

@di
Copy link
Member

di commented Apr 17, 2024

Only concern here is making sure we emit a matching ProjectEvent anywhere we are removing a JournalEntry.

It looks like that's the case everywhere but the admin view (which is a past oversight).

Did this in 308ba3a!

Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the cleanup, especially for items that aren't super journal-oriented like adding owners and roles.
(I completely acknowledge that I've used journals as the audit trail for packages that have been removed and re-uploaded, since we nuke Events, but that's a different problem, and not part of journal's responsibility.)

I had one implementation question inline.

Another question I think we should answer is how might we do lock monitoring, since I don't think we do that today. If memory serves, a Datadog Postgres monitor could emit those metrics, but I don't think we have that Agent configuration set today.

warehouse/packaging/models.py Outdated Show resolved Hide resolved
@dstufft dstufft requested review from di, miketheman and a team May 22, 2024 13:03
@dstufft
Copy link
Member Author

dstufft commented May 22, 2024

I've updated this pull request to use a before_insert hook and verified that the query taking the advisory lock is still being emitted prior to an INSERT of a Journal.

@di had previously added the events that @ewdurbin had asked for to ensure we're not losing auditability.

I think this should be good for review now.

@dstufft
Copy link
Member Author

dstufft commented May 22, 2024

I've also reverted most of the places where we stopped emitting JournalEntry items, since the journal is our only place where we durably store these events currently.

Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! I had one non-blocking comment.

I'm still curious how this will operate at scale, and how we are likely to determine if we're getting lock contention. Is that something you've considered already?

warehouse/packaging/models.py Show resolved Hide resolved
@dstufft
Copy link
Member Author

dstufft commented May 24, 2024

I'm still curious how this will operate at scale, and how we are likely to determine if we're getting lock contention. Is that something you've considered already?

If it becomes an issue the latency of anything that adds journal entries would increase. It should be pretty obvious because the query that takes the lock will end up taking a long time, so should show up as a slow query.

Because we're using before_insert, we won't take a lock until we flush the session that contains a new JournalEntry, which will generally only happen right at the end of a request, so right before we commit it.

It does by it's nature serialize those transactions, which is the goal because the fact they're not serialized is what is causing the underlying bug (which I've seen happen about 8-10 times in the last 2 weeks).

@dstufft dstufft merged commit eabc81d into pypi:main May 30, 2024
17 checks passed
@dstufft dstufft deleted the monotonic-journals branch May 30, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants