ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs #13753

wesm · 2022-07-29T22:53:24Z

This is something that I've hacked on a little bit while on airplanes and in idle moments for my own curiosity and just cleaned up to turn into a PR. This PR changes arrow::compute::InputType to only store const DataType* instead of shared_ptr<DataType>, which means that we don't have to copy the smart pointers for every kernel and function that's registered. Amazingly, this trims the size of libarrow.so by about 1% (~300KB) because of not having a bunch of inlined shared_ptr code.

This change does have consequences -- developers need to be careful about creating function signatures with data types that may have temporary storage duration (all of the APIs like arrow::int64() return static instances, and I changed arrow::duration and some other type factories to return static instances, too). But I think the benefits outweigh the costs.

github-actions · 2022-07-29T22:53:44Z

https://issues.apache.org/jira/browse/ARROW-17259

github-actions · 2022-07-29T22:53:45Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

wesm · 2022-07-29T22:57:39Z

cpp/src/arrow/type.h

-// Integer, floating point, base binary, and temporal
-ARROW_EXPORT
-const std::vector<std::shared_ptr<DataType>>& PrimitiveTypes();
-


There wasn't really a good reason for these functions to be here -- they probably even should be in an internal namespace in arrow::compute

pitrou · 2022-08-03T14:42:37Z

@wesm Would you like to take a look at the compile failures?

wesm · 2022-08-03T14:47:26Z

@pitrou yes, I'll fix them today

wesm · 2022-08-03T17:05:49Z

@pitrou there's a doctest failure here unrelated to this PR, seems to be caused by the 9.0.0 to 10.0.0 version bump (which made the metadata 1 byte bigger):

2022-08-03T17:03:42.8027721Z =================================== FAILURES ===================================
2022-08-03T17:03:42.8028142Z ___________________ [doctest] pyarrow.parquet.read_metadata ____________________
2022-08-03T17:03:42.8028491Z 3406 
2022-08-03T17:03:42.8028721Z 3407     Examples
2022-08-03T17:03:42.8029041Z 3408     --------
2022-08-03T17:03:42.8029336Z 3409     >>> import pyarrow as pa
2022-08-03T17:03:42.8029679Z 3410     >>> import pyarrow.parquet as pq
2022-08-03T17:03:42.8030399Z 3411     >>> table = pa.table({'n_legs': [4, 5, 100],
2022-08-03T17:03:42.8030920Z 3412     ...                   'animal': ["Dog", "Brittle stars", "Centipede"]})
2022-08-03T17:03:42.8031437Z 3413     >>> pq.write_table(table, 'example.parquet')
2022-08-03T17:03:42.8031751Z 3414 
2022-08-03T17:03:42.8032144Z 3415     >>> pq.read_metadata('example.parquet')
2022-08-03T17:03:42.8032625Z Differences (unified diff with -expected +actual):
2022-08-03T17:03:42.8033004Z     @@ -1,7 +1,7 @@
2022-08-03T17:03:42.8033412Z     -<pyarrow._parquet.FileMetaData object at ...>
2022-08-03T17:03:42.8033907Z     -  created_by: parquet-cpp-arrow version ...
2022-08-03T17:03:42.8034339Z     +<pyarrow._parquet.FileMetaData object at 0x7fd2089b3830>
2022-08-03T17:03:42.8034877Z     +  created_by: parquet-cpp-arrow version 10.0.0-SNAPSHOT
2022-08-03T17:03:42.8035225Z        num_columns: 2
2022-08-03T17:03:42.8035477Z        num_rows: 3
2022-08-03T17:03:42.8035752Z        num_row_groups: 1
2022-08-03T17:03:42.8036047Z        format_version: 2.6
2022-08-03T17:03:42.8036407Z     -  serialized_size: 561
2022-08-03T17:03:42.8036708Z     +  serialized_size: 562
2022-08-03T17:03:42.8037076Z 
2022-08-03T17:03:42.8037502Z /opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/parquet/__init__.py:3415: DocTestFailure

pitrou · 2022-08-03T17:09:35Z

Hmm... I assume it's because of the serialized_size difference? It's weird that it would differ by one byte, but perhaps that's due to a different Thrift version. We should probably simply relax this test.

wesm · 2022-08-03T17:11:14Z

I think the reason is the change from 9.0.0 to 10.0.0 which added one byte, here is the doctest

    >>> pq.read_metadata('example.parquet')
    <pyarrow._parquet.FileMetaData object at ...>
      created_by: parquet-cpp-arrow version ...
      num_columns: 2
      num_rows: 3
      num_row_groups: 1
      format_version: 2.6
      serialized_size: 561

wesm · 2022-08-03T19:25:30Z

I patched this in #13790

wesm · 2022-08-03T21:13:52Z

I think this can be merged once the build is green if the changes look okay.

pitrou · 2022-08-04T12:19:58Z

InputType and OutputType instances are only created at kernel registration, if I'm not mistaken?

lidavidm · 2022-08-04T13:13:07Z

But those Input/OutputType instances are stored in the KernelSignature, so those references need to stay valid.

lidavidm · 2022-08-04T13:25:39Z

cpp/src/arrow/compute/kernels/aggregate_var_std.cc

+namespace {}  // namespace
+


nit: empty namespace?

lidavidm · 2022-08-04T13:29:42Z

cpp/src/arrow/compute/kernels/hash_aggregate.cc

+                                std::shared_ptr<HashAggregateFunction> func,
+                                FunctionRegistry* registry) {
+  static std::shared_ptr<DataType> decimal128_example = decimal128(1, 1);
+  static std::shared_ptr<DataType> decimal256_example = decimal256(1, 1);


extreme nit: there's some inconsistency here and above, maybe PopulateHashGenericKernels should just call PopulateNumericHashKernels?

pitrou · 2022-08-04T13:35:40Z

But those Input/OutputType instances are stored in the KernelSignature, so those references need to stay valid.

My question was more along the lines of: what is the point of micro-optimizing a number of shared_ptr copies that only happen at startup?

wesm · 2022-08-04T14:06:51Z

@pitrou this isn’t a micro-optimization imho — when 1% of your binary size is concerned with inline shared_ptr code in this narrow part of the library that speaks to a design problem. The kernel registry population is still very bloated and should be able to be streamlined further to yield smaller binary size. The use of shared pointers also causes extra overhead in kernel dispatching and input type checking, but we would need to write some benchmarks to quantify that better

it might be a useful exercise at some point to analyze the disassembly of libarrow.so to understand the total contribution of inline shared_ptr code to the binary size. There are a lot of places where we inline constructors for example where there is probably no need

pitrou · 2022-08-04T14:40:37Z

@pitrou this isn’t a micro-optimization imho — when 1% of your binary size is concerned with inline shared_ptr code in this narrow part of the library that speaks to a design problem.

Is this looking at the text segment, or the entire binary size? If the latter, that might mean debug symbols or something... (those can be stripped for distribution purposes)

pitrou · 2022-08-04T14:41:08Z

The use of shared pointers also causes extra overhead in kernel dispatching and input type checking, but we would need to write some benchmarks to quantify that better

I would like to see benchmarks indeed, because as long as you don't copy them the overhead should be minimal.

pitrou · 2022-08-04T14:53:59Z

I took the PR and rebased it internally to then compare the sizes:

master:

$ size build-release/release/libarrow.so
   text	   data	    bss	    dec	    hex	filename
23471557	 317416	2568225	26357198	1922dce	build-release/release/libarrow.so
$ ls -la build-release/release/libarrow.so.1000.0.0 
-rwxrwxr-x 1 antoine antoine 37289616 août   4 16:48 build-release/release/libarrow.so.1000.0.0
$ strip build-release/release/libarrow.so.1000.0.0 
$ ls -la build-release/release/libarrow.so.1000.0.0 
-rwxrwxr-x 1 antoine antoine 23794552 août   4 16:48 build-release/release/libarrow.so.1000.0.0

PR:

$ size build-release/release/libarrow.so
   text	   data	    bss	    dec	    hex	filename
23314548	 314736	2568801	26198085	18fc045	build-release/release/libarrow.so
$ ls -la build-release/release/libarrow.so.1000.0.0 
-rwxrwxr-x 1 antoine antoine 37097032 août   4 16:49 build-release/release/libarrow.so.1000.0.0
$ strip build-release/release/libarrow.so.1000.0.0 
$ ls -la build-release/release/libarrow.so.1000.0.0 
-rwxrwxr-x 1 antoine antoine 23634808 août   4 16:50 build-release/release/libarrow.so.1000.0.0

So it's a 0.7% saving here in both text segment size and stripped library size.

wesm · 2022-08-05T20:39:46Z

Any objections to merging this?

pitrou · 2022-08-05T21:30:50Z

If there's no sizable performance improvement then I'm against merging this. Robustness should be a guiding principle and replacing shared pointers with raw pointers goes against it.

wesm · 2022-08-05T21:43:21Z

I do not think the implementation is less robust — the only change is that ephemeral DataType instances cannot be used to instantiate InputType without creating a TypeMatcher that wraps and retains a shared_ptr for the cases where it is truly needed. If the developer does create a KernelSignature / InputType with an ephemeral shared_ptr<DataType>, the signature would be unusable so any unit test which attempts to use the kernel would segfault. The probability of an Arrow user ever facing a bug related to this essentially zero — it would amount to us merging untested kernels code (because tests would immediately reveal the programming error).

It also comes down to a cost philosophy in this code:

Pay the cost of using shared_ptr always — both in generated code (because most shared_ptr operations are inlined by the compiler) and in performance
Pay the cost only when it's needed

It turns out that in this code, that retaining a shared_ptr<DataType> for kernel function signatures and type validation is rarely needed. Why pay the cost for > 95% of cases (or whatever the number is?) when the shared_ptr is only actually needed in < 5% of cases? I recall that we've already seen shared_ptr contention show up unfavorably in performance profiles (e.g. ARROW-16161).

I will take the time when I can to write some benchmarks (for me, the present-and-future savings in generated code is evidence enough) that will hopefully provide some additional evidence to support this. In the meantime I hope that the rebase conflicts will not be too annoying to keep up with.

wesm · 2022-08-08T16:46:15Z

I've done a little exploration locally and concluded that shared_ptr<DataType> versus const DataType* has no impact on the performance of Function::DispatchExact (at least on recent clang), which confirms the general guidance that dereferencing shared pointers is no different than a raw pointer — it's the creation, copying, and destruction of shared pointers that's expensive, especially in a multithreaded setting.

The last thing I'm interested in is the "bootup time" of the function registry, so I'm going to compare the before and after performance of that and report it here. If there is no significant difference, I will refactor this PR to restore shared pointers to InputType and OutputType but rather avoid allowing shared pointer copies to get inlined in these functions, so hopefully that will yield the same kind of reduction in binary bloat. There are some other beneficial changes here so I'd like to get those in — I'll report back when I have that extra performance data and have done the refactoring.

There's probably other places where we're either passing a shared pointer where there is no need or where we inline non-performance-sensitive-constructors which copy shared pointers where there is similarly little need, so let's be an eye out for htat.

wesm · 2022-09-08T18:21:19Z

I had some personal stuff come up in August that got in the way of completing this work -- I will pick this up hopefully sometime this month (September) and hopefully reach a point that is satisfactory to all.

amol- · 2023-03-30T17:13:48Z

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

wesm changed the title ~~ARROW-17259: [C++] Do not shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs~~ ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs Jul 29, 2022

github-actions bot added Component: C++ Component: Parquet labels Jul 29, 2022

wesm commented Jul 29, 2022

View reviewed changes

wesm force-pushed the leaner-kernel-registry branch from b0d5cdc to a3a1820 Compare August 1, 2022 22:41

wesm force-pushed the leaner-kernel-registry branch from a3a1820 to 766f292 Compare August 3, 2022 14:55

wesm added 11 commits August 3, 2022 16:12

De-shared_ptr-ize a bit

d5865b6

More de-shared_ptr-izing

98085ae

Get recompiling

d9d40cc

Fix compiler warnings in release mode

8943c8c

Fix warnings in gcc

95634ac

Some more de-shared_ptr-izing

2f78e50

Things compiling again

b2e0b38

Fix scalar tests

adfac56

Fix remaining unit tests

9c1fb18

Fix compiler warning on gcc 4.x

e8992be

Disambiguate PrimitiveTypes

ec53645

wesm force-pushed the leaner-kernel-registry branch from 766f292 to ec53645 Compare August 3, 2022 21:12

lidavidm mentioned this pull request Aug 4, 2022

ARROW-17289: [C++] Add type category membership checks #13783

Merged

lidavidm approved these changes Aug 4, 2022

View reviewed changes

asfimport mentioned this pull request Oct 19, 2022

[C++] Use shared_ptr<DataType> less throughout arrow/compute #32541

Open

amol- closed this Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs #13753

ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs #13753

wesm commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

wesm Jul 29, 2022

pitrou commented Aug 3, 2022

wesm commented Aug 3, 2022

wesm commented Aug 3, 2022 •

edited

Loading

pitrou commented Aug 3, 2022

wesm commented Aug 3, 2022

wesm commented Aug 3, 2022

wesm commented Aug 3, 2022

pitrou commented Aug 4, 2022

lidavidm commented Aug 4, 2022

lidavidm Aug 4, 2022

lidavidm Aug 4, 2022

pitrou commented Aug 4, 2022

wesm commented Aug 4, 2022

pitrou commented Aug 4, 2022

pitrou commented Aug 4, 2022

pitrou commented Aug 4, 2022

wesm commented Aug 5, 2022

pitrou commented Aug 5, 2022 •

edited

Loading

wesm commented Aug 5, 2022 •

edited

Loading

wesm commented Aug 8, 2022

wesm commented Sep 8, 2022

amol- commented Mar 30, 2023

ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs #13753

ARROW-17259: [C++] Do not use shared_ptr<DataType> with kernel function signatures, do less copying of shared_ptrs #13753

Conversation

wesm commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

github-actions bot commented Jul 29, 2022

wesm Jul 29, 2022

Choose a reason for hiding this comment

pitrou commented Aug 3, 2022

wesm commented Aug 3, 2022

wesm commented Aug 3, 2022 • edited Loading

pitrou commented Aug 3, 2022

wesm commented Aug 3, 2022

wesm commented Aug 3, 2022

wesm commented Aug 3, 2022

pitrou commented Aug 4, 2022

lidavidm commented Aug 4, 2022

lidavidm Aug 4, 2022

Choose a reason for hiding this comment

lidavidm Aug 4, 2022

Choose a reason for hiding this comment

pitrou commented Aug 4, 2022

wesm commented Aug 4, 2022

pitrou commented Aug 4, 2022

pitrou commented Aug 4, 2022

pitrou commented Aug 4, 2022

wesm commented Aug 5, 2022

pitrou commented Aug 5, 2022 • edited Loading

wesm commented Aug 5, 2022 • edited Loading

wesm commented Aug 8, 2022

wesm commented Sep 8, 2022

amol- commented Mar 30, 2023

wesm commented Aug 3, 2022 •

edited

Loading

pitrou commented Aug 5, 2022 •

edited

Loading

wesm commented Aug 5, 2022 •

edited

Loading