-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax locking requirements for collecting table stats #455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this manually on all affected queries with an outstanding locktype = 'tuple' AccessExclusiveLock. It'd be nice to add this to our integration tests, but locking doesn't currently have coverage and I don't see a great way to add that. Thoughts?
input/postgres/relation_stats.go
Outdated
@@ -14,7 +14,7 @@ const relationStatsSQLInsertsSinceVacuumFieldDefault string = "0" | |||
|
|||
const relationStatsSQL = ` | |||
WITH locked_relids AS ( | |||
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL | |||
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL AND locktype <> 'tuple' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if locktype <> 'tuple'
or locktype = 'relation'
makes more sense. There are a lot of different locktypes
so I decided to be more conservative, but I could be convinced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhmm, looking at the code, only relation
, extend
, page
, tuple
locktype can have relation
column for pg_locks (look at values[2]
or nulls[2]
):
https://github.com/postgres/postgres/blob/e0b2eed047df9045664da6f724cb42c10f8b12f0/src/backend/utils/adt/lockfuncs.c#L246-L354
Among this, feels like page
is also somewhat similar to tuple
, in terms of it won't be locking pg_table_size()
. I'm not so sure about extend
, but looking at the purpose of this lock (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/apg-waits.lockextend.html), maybe it also won't cause problems with pg_table_size()
?
If you agree, maybe using locktype = 'relation'
makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for diving into the Postgres source to investigate this. I think given this, locktype = 'relation'
looks reasonable. I think the only concern there is if a future version of Postgres adds a lock type similar to, but distinct from, locktype = 'relation'
, and we start trying to collect data when we shouldn't. Alternately, we could change this to locktype NOT IN ('extend', 'page', 'tuple')
to be a little more future-proof, but that gets a little wordy. Any thoughts on this @lfittl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. I feel like this part of Postgres changes rarely (if at all), so I'm not too worried about future proofing, but it feels slightly safer to say "these are the locktypes that are not a problem, even when its an AccessExclusiveLock", i.e. I think locktype NOT IN ('extend', 'page', 'tuple')
might indeed be better, even if its wordy.
That said, is there a way for us to verify that its safe to ignore relation extends? (extend = table is grown to add new pages -- I assume that the locks taken for that don't conflict with relation size / catalog entries, but would be good to verify somehow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be hard to control the timing of an extend
lock. But maybe we could note what locks the collector needs when running these queries, take those manually, and then try to extend a relation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to read a bit more of the Postgres source, and my understanding is because relation extension uses a separate lock tag (LOCKTAG_RELATION_EXTEND
, see https://github.com/postgres/postgres/blob/master/src/include/storage/lock.h#L195) it doesn't conflict with the AccessShareLock
taken with the LOCKTAG_RELATION
lock tag as part of us calling pg_relation_size
.
So for now I think we can assume that all three of these are safe to ignore, and its just locktype = 'relation'
that's a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comment re: locktype, though this is already good start, so I'm fine to merge as is too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but I think we could adjust to locktype = 'relation'
as discussed to ignore page and extend locks as well.
input/postgres/relation_stats.go
Outdated
@@ -14,7 +14,7 @@ const relationStatsSQLInsertsSinceVacuumFieldDefault string = "0" | |||
|
|||
const relationStatsSQL = ` | |||
WITH locked_relids AS ( | |||
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL | |||
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL AND locktype <> 'tuple' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to read a bit more of the Postgres source, and my understanding is because relation extension uses a separate lock tag (LOCKTAG_RELATION_EXTEND
, see https://github.com/postgres/postgres/blob/master/src/include/storage/lock.h#L195) it doesn't conflict with the AccessShareLock
taken with the LOCKTAG_RELATION
lock tag as part of us calling pg_relation_size
.
So for now I think we can assume that all three of these are safe to ignore, and its just locktype = 'relation'
that's a problem.
Currently, if tables are locked in AccessExclusiveMode, we skip collecting most table structure information and table and column stats. We do this since the lock would conflict with the lock the collector needs and potentially cause cascading locking problems and impact the application. However, we're not checking the locktype, so the collector also avoids collecting stats for tables with AccessExclusiveMode locktype = 'tuple' locks, which do not actually block collection. These can be fairly common, e.g., if a transaction issues a `SELECT FOR UPDATE` query while another transaction has already obtained a `SELECT FOR UPDATE` lock. Relax the locking requirements by ignoring AccessExclusiveMode locktype = 'tuple' locks when collecting schema structure and stats.
The latter should be sufficient to avoid locking problems and allows us to ignore other kinds of locks.
ffdcddf
to
a2581b5
Compare
Thanks. This patch fixed our issue. |
@benoittgt great, glad to hear it! |
Currently, if tables are locked in AccessExclusiveMode, we skip
collecting most table structure information and table and column
stats. We do this since the lock would conflict with the lock the
collector needs and potentially cause cascading locking problems and
impact the application. However, we're not checking the locktype, so
the collector also avoids collecting stats for tables with
AccessExclusiveMode locktype = 'tuple' locks, which do not actually
block collection. These can be fairly common, e.g., if a transaction
issues a
SELECT FOR UPDATE
query while another transaction hasalready obtained a
SELECT FOR UPDATE
lock.Relax the locking requirements by ignoring AccessExclusiveMode
locktype = 'tuple' locks when collecting schema structure and stats.