-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs #13753
Conversation
|
// Integer, floating point, base binary, and temporal | ||
ARROW_EXPORT | ||
const std::vector<std::shared_ptr<DataType>>& PrimitiveTypes(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't really a good reason for these functions to be here -- they probably even should be in an internal namespace in arrow::compute
b0d5cdc
to
a3a1820
Compare
@wesm Would you like to take a look at the compile failures? |
@pitrou yes, I'll fix them today |
a3a1820
to
766f292
Compare
@pitrou there's a doctest failure here unrelated to this PR, seems to be caused by the 9.0.0 to 10.0.0 version bump (which made the metadata 1 byte bigger):
|
Hmm... I assume it's because of the |
I think the reason is the change from 9.0.0 to 10.0.0 which added one byte, here is the doctest
|
I patched this in #13790 |
766f292
to
ec53645
Compare
I think this can be merged once the build is green if the changes look okay. |
|
But those Input/OutputType instances are stored in the KernelSignature, so those references need to stay valid. |
namespace {} // namespace | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: empty namespace?
std::shared_ptr<HashAggregateFunction> func, | ||
FunctionRegistry* registry) { | ||
static std::shared_ptr<DataType> decimal128_example = decimal128(1, 1); | ||
static std::shared_ptr<DataType> decimal256_example = decimal256(1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extreme nit: there's some inconsistency here and above, maybe PopulateHashGenericKernels
should just call PopulateNumericHashKernels
?
My question was more along the lines of: what is the point of micro-optimizing a number of shared_ptr copies that only happen at startup? |
@pitrou this isn’t a micro-optimization imho — when 1% of your binary size is concerned with inline shared_ptr code in this narrow part of the library that speaks to a design problem. The kernel registry population is still very bloated and should be able to be streamlined further to yield smaller binary size. The use of shared pointers also causes extra overhead in kernel dispatching and input type checking, but we would need to write some benchmarks to quantify that better it might be a useful exercise at some point to analyze the disassembly of libarrow.so to understand the total contribution of inline shared_ptr code to the binary size. There are a lot of places where we inline constructors for example where there is probably no need |
Is this looking at the text segment, or the entire binary size? If the latter, that might mean debug symbols or something... (those can be stripped for distribution purposes) |
I would like to see benchmarks indeed, because as long as you don't copy them the overhead should be minimal. |
I took the PR and rebased it internally to then compare the sizes:
So it's a 0.7% saving here in both text segment size and stripped library size. |
Any objections to merging this? |
If there's no sizable performance improvement then I'm against merging this. Robustness should be a guiding principle and replacing shared pointers with raw pointers goes against it. |
I do not think the implementation is less robust — the only change is that ephemeral It also comes down to a cost philosophy in this code:
It turns out that in this code, that retaining a I will take the time when I can to write some benchmarks (for me, the present-and-future savings in generated code is evidence enough) that will hopefully provide some additional evidence to support this. In the meantime I hope that the rebase conflicts will not be too annoying to keep up with. |
I've done a little exploration locally and concluded that The last thing I'm interested in is the "bootup time" of the function registry, so I'm going to compare the before and after performance of that and report it here. If there is no significant difference, I will refactor this PR to restore shared pointers to There's probably other places where we're either passing a shared pointer where there is no need or where we inline non-performance-sensitive-constructors which copy shared pointers where there is similarly little need, so let's be an eye out for htat. |
I had some personal stuff come up in August that got in the way of completing this work -- I will pick this up hopefully sometime this month (September) and hopefully reach a point that is satisfactory to all. |
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |
This is something that I've hacked on a little bit while on airplanes and in idle moments for my own curiosity and just cleaned up to turn into a PR. This PR changes
arrow::compute::InputType
to only storeconst DataType*
instead ofshared_ptr<DataType>
, which means that we don't have to copy the smart pointers for every kernel and function that's registered. Amazingly, this trims the size of libarrow.so by about 1% (~300KB) because of not having a bunch of inlined shared_ptr code.This change does have consequences -- developers need to be careful about creating function signatures with data types that may have temporary storage duration (all of the APIs like
arrow::int64()
return static instances, and I changedarrow::duration
and some other type factories to return static instances, too). But I think the benefits outweigh the costs.