-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/catalog/descs: use allDescriptors as a negative cache for by-ID lookups #61820
sql/catalog/descs: use allDescriptors as a negative cache for by-ID lookups #61820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, the django introspection query was doing O(indexes) lookups
and now it is doing 2.
hmm did something change? when i first added the django introspection benchmark, the number of lookups was constant in the number of tables
change looks good though! just had small comments and confusion on what made the number of lookups increase for pg_table_is_visible
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)
pkg/sql/catalog/descs/collection.go, line 1029 at r1 (raw file):
} if tc.allDescriptors != nil {
should this be tc.allDescriptorsByID != nil
?
pkg/sql/catalog/descs/collection.go, line 1030 at r1 (raw file):
if tc.allDescriptors != nil { if _, exists := tc.allDescriptorsByID[id]; !exists {
since the descriptor value is never used, could this be a map[id]struct{}
to save memory?
also, which makes me think, do we need memory accounting on these caches? (would be an issue to resolve in the future, not in this PR)
oh i guess the roundtrips increased because of #61493 ? i am less confused now! |
…ookups When doing introspection queries we tend to pull and cache the complete set of descriptors. This powers subsequent calls to various virtual tables. However, we also often need to grab descriptors by ID. This happens in some of the virtual indexes and it especially happens in the call to `pg_table_is_visible`. When that function was called on all of the oids in pg_class it was forced to make a bunch of kv lookups; namely one for each and every index and virtual table (depending on the query plan). We can short-circuit all of that with a map of descriptors we are already holding in memory. Release note (performance improvement): Reduce the number of round-trips required to call `pg_table_is_visible` in the context of pg_catalog queries.
74bb820
to
7574bfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I crafted a little abstraction around all of this. This should be backportable. I also sprinkled some TODO that I very much am planning on doing in the 21.2 cycle.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rafiss)
pkg/sql/catalog/descs/collection.go, line 188 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Gratuitous bikeshed: I believe @rafiss 's comments could be addressed by getting rid of
allDescriptors
altogether and usingallDescriptorsByID
instead. This means rewritingGetAllDescriptors
so that it returns an ordered slice, which should be fine. If it isn't, replaceallDescriptorsByID
with acatalog.DescriptorIDSet
perhaps? Duplicate sources-of-truth make me uncomfortable, I guess.
in due time I'll rework all of this but for now I'm trying to do something minimal for this release.
pkg/sql/catalog/descs/collection.go, line 1030 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
since the descriptor value is never used, could this be a
map[id]struct{}
to save memory?also, which makes me think, do we need memory accounting on these caches? (would be an issue to resolve in the future, not in this PR)
Given that the descriptors and very big, I don't care that much about one byte vs one word vs two words. I changed it to an int which is a word. 7 bytes per descriptor when descriptors are hundreds doesn't feel very compelling to me right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! is a backport to 21.1 needed too?
TFTR! bors r+ |
Build succeeded: |
When doing introspection queries we tend to pull and cache the complete
set of descriptors. This powers subsequent calls to various virtual
tables. However, we also often need to grab descriptors by ID. This
happens in some of the virtual indexes and it especially happens in
the call to
pg_table_is_visible
. When that function was called onall of the oids in pg_class it was forced to make a bunch of kv lookups;
namely one for each and every index and virtual table (depending on
the query plan). We can short-circuit all of that with a map of descriptors
we are already holding in memory.
Before this change, the django introspection query was doing O(indexes) lookups
and now it is doing 2.
Release note (performance improvement): Reduce the number of round-trips
required to call
pg_table_is_visible
in the context of pg_catalogqueries.