sql: avoid copying ColumnDescriptors in initColsForScan #50727
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change switches
scanNode
from constructing and passing around a[]ColumnDescriptor
to constructing and passing around a[]*ColumnDescriptor
which references the existingColumnDescriptor
s in theTableDescriptor
. This is in response to seeing the allocation ininitColsForScan
pop up as the single largest source of total heap allocations by size (alloc_space
, the heap profile sample that most closely measures GC pressure) while running TPC-E. The allocation ininitColsForScan
was responsible for 4.1% of thealloc_space
profile after a 30 minute run of the workload.In general, this indicates that we should move away from copying around these ColumnDescriptors by value. They are currently 120 bytes large, which isn't huge, but also isn't small. Furthermore, unlike TableDescriptors, we almost never pass around only a single ColumnDescriptor. Instead, we're usually operating on every column touched by a query, so this 120 bytes can blow up fast. For instance, if we estimate that the average TPC-E query touches somewhere between 8 and 10 columns then a single copy of all of these descriptors during the execution of a query (like we were doing in initColsForScan) requires allocating and copying over 1KB of memory.
Yahor, I'm assigning you for two reasons. One, because you seem to be working most closely to this code and likely have a good idea for how disruptive this kind of change will be. I don't want to split the world into functions that work with []ColumnDescriptor and functions that work with []*ColumnDescriptor. I also figured you'd be interested to know that I was running this using an older SHA and the second and third largest sources of allocations were in
createTableReaders
(3.54%) andColumnTypesWithMutations
(2.60%). Both had to do with constructing slices oftypes.T
and both appear to have been fixed by c06277e. So nice job with that change!