GH-40357: [C++] Add benchmark for ToTensor conversions #40358

AlenkaF · 2024-03-05T09:12:44Z

Rationale for this change

We should add benchmarks to be sure not to cause regressions while working on additional implementations of RecordBatch::ToTensor and Table::ToTensor.

What changes are included in this PR?

New cpp/src/arrow/to_tensor_benchmark.cc file.

GitHub Issue: [C++] Add benchmark for ToTensor conversions #40357

jorisvandenbossche · 2024-03-05T16:33:57Z

Can you show the result of running them? And we might want to use some more data to get a more reliable result?

AlenkaF · 2024-03-06T08:42:52Z

This was the result output:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-ahcnq1ah/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 17.32, 18.72, 16.18
----------------------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------
RecordBatchUniformTypesSimple        624 ns          624 ns      1125492 bytes_per_second=1.29039Gi/s items_per_second=43.2982M/s

WIll use RandomArrayGenerator to generate more data and add the result here.

AlenkaF · 2024-03-06T10:28:47Z

The result from running archery benchmark diff --benchmark-filter=BatchToTensorSimple on the second commit (but with arrays of length 100, not 500):

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-jun4cokj/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 24.95, 25.04, 19.14
---------------------------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>            550 ns          550 ns      1254345 bytes_per_second=4.06699Gi/s items_per_second=545.863M/s
BatchToTensorSimple<UInt16Type>           555 ns          553 ns      1235570 bytes_per_second=8.08251Gi/s items_per_second=542.408M/s
BatchToTensorSimple<UInt32Type>           569 ns          568 ns      1253335 bytes_per_second=15.7341Gi/s items_per_second=527.949M/s
BatchToTensorSimple<UInt64Type>           580 ns          580 ns      1237449 bytes_per_second=30.8253Gi/s items_per_second=517.163M/s
BatchToTensorSimple<Int8Type>             548 ns          548 ns      1249732 bytes_per_second=4.07944Gi/s items_per_second=547.533M/s
BatchToTensorSimple<Int16Type>            623 ns          568 ns      1233654 bytes_per_second=7.87246Gi/s items_per_second=528.312M/s
BatchToTensorSimple<Int32Type>            565 ns          564 ns      1204923 bytes_per_second=15.8461Gi/s items_per_second=531.706M/s
BatchToTensorSimple<Int64Type>            585 ns          585 ns      1269059 bytes_per_second=30.5699Gi/s items_per_second=512.878M/s
BatchToTensorSimple<HalfFloatType>        545 ns          544 ns      1217900 bytes_per_second=8.21219Gi/s items_per_second=551.111M/s
BatchToTensorSimple<FloatType>            575 ns          574 ns      1239991 bytes_per_second=15.5835Gi/s items_per_second=522.896M/s
BatchToTensorSimple<DoubleType>           567 ns          566 ns      1152074 bytes_per_second=31.5943Gi/s items_per_second=530.065M/s

cpp/src/arrow/to_tensor_benchmark.cc

AlenkaF · 2024-03-13T13:45:05Z

Current output when running archery benchmark diff --benchmark-filter=BatchToTensorSimple:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-e8lvkw1g/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 27.50, 28.87, 23.74
-----------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536/10000             4121 us         4107 us          171 bytes_per_second=15.217Mi/s items_per_second=12.765G/s null_percent=0.01 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/100               4273 us         4219 us          170 bytes_per_second=14.8143Mi/s items_per_second=12.4271G/s null_percent=1 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/10                4019 us         4003 us          173 bytes_per_second=15.6149Mi/s items_per_second=13.0988G/s null_percent=10 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/2                 4100 us         4083 us          136 bytes_per_second=15.3084Mi/s items_per_second=12.8416G/s null_percent=50 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/1                 3972 us         3894 us          178 bytes_per_second=16.0516Mi/s items_per_second=13.465G/s null_percent=100 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/0                 3953 us         3927 us          178 bytes_per_second=15.9142Mi/s items_per_second=13.3498G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304/10000       15398661 us      1947088 us            1 bytes_per_second=2.05435Mi/s items_per_second=1.72331G/s null_percent=0.01 size=4.1943M
.
.
.

cpp/src/arrow/tensor_benchmark.cc

AlenkaF · 2024-03-14T14:18:39Z

Output from running the benchmarks on the latest commit:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-y9o8zv4d/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 20.67, 17.39, 10.95
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536           443099 ns       442863 ns         1580 bytes_per_second=141.127Mi/s items_per_second=14.7983G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304       38391076 ns     35795222 ns           18 bytes_per_second=111.747Mi/s items_per_second=11.7175G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt16Type>/65536          882040 ns       881129 ns          747 bytes_per_second=70.9318Mi/s items_per_second=7.43773G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt16Type>/4194304     118462838 ns     81059222 ns            9 bytes_per_second=49.3466Mi/s items_per_second=5.17437G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt32Type>/65536         1937139 ns      1933673 ns          361 bytes_per_second=32.3219Mi/s items_per_second=3.3892G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt32Type>/4194304    1271556625 ns    651396000 ns            1 bytes_per_second=6.14066Mi/s items_per_second=643.895M/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt64Type>/65536         4440503 ns      4344614 ns          166 bytes_per_second=14.3856Mi/s items_per_second=1.50844G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt64Type>/4194304    1.1486e+10 ns   1742537000 ns            1 bytes_per_second=2.2955Mi/s items_per_second=240.701M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int8Type>/65536            415187 ns       410957 ns         1710 bytes_per_second=152.084Mi/s items_per_second=15.9472G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int8Type>/4194304        34241740 ns     33962150 ns           20 bytes_per_second=117.778Mi/s items_per_second=12.3499G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int16Type>/65536           812298 ns       810349 ns          917 bytes_per_second=77.1273Mi/s items_per_second=8.08738G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int16Type>/4194304       75301182 ns     70352375 ns            8 bytes_per_second=56.8566Mi/s items_per_second=5.96185G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int32Type>/65536          2033466 ns      2026663 ns          329 bytes_per_second=30.8389Mi/s items_per_second=3.23369G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int32Type>/4194304     1233238541 ns    562396000 ns            1 bytes_per_second=7.11243Mi/s items_per_second=745.792M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int64Type>/65536          3969188 ns      3959770 ns          178 bytes_per_second=15.7837Mi/s items_per_second=1.65505G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int64Type>/4194304     1.5188e+10 ns   1823171000 ns            1 bytes_per_second=2.19398Mi/s items_per_second=230.055M/s null_percent=0 size=4.1943M
BatchToTensorSimple<HalfFloatType>/65536       899771 ns       888509 ns          749 bytes_per_second=70.3426Mi/s items_per_second=7.37595G/s null_percent=0 size=65.536k
BatchToTensorSimple<HalfFloatType>/4194304   71104797 ns     69327375 ns            8 bytes_per_second=57.6973Mi/s items_per_second=6.05G/s null_percent=0 size=4.1943M
BatchToTensorSimple<FloatType>/65536          2025175 ns      2021084 ns          347 bytes_per_second=30.924Mi/s items_per_second=3.24262G/s null_percent=0 size=65.536k
BatchToTensorSimple<FloatType>/4194304     1087905188 ns    395840500 ns            2 bytes_per_second=10.1051Mi/s items_per_second=1.05959G/s null_percent=0 size=4.1943M
BatchToTensorSimple<DoubleType>/65536         4118269 ns      4089947 ns          170 bytes_per_second=15.2814Mi/s items_per_second=1.60237G/s null_percent=0 size=65.536k
BatchToTensorSimple<DoubleType>/4194304    9901101750 ns   1684713000 ns            1 bytes_per_second=2.37429Mi/s items_per_second=248.963M/s null_percent=0 size=4.1943M

jorisvandenbossche · 2024-03-15T10:18:04Z

cpp/src/arrow/tensor_benchmark.cc

+  RegressionArgs args(state);
+  std::shared_ptr<DataType> ty = TypeTraits<ValueType>::type_singleton();
+
+  const int64_t kNumRows = args.size;


I would maybe still do the / 8 ( or division by sizeof(CType)), because the reported "Time" of some of benchmarks is still in the > second range

New result, dividing size by sizeof(CType):

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-wyoew3d4/WORKSPACE/build/release/arrow-tensor-benchmark Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 19.13, 18.37, 12.07 ----------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------- BatchToTensorSimple<UInt8Type>/65536 440965 ns 434790 ns 1628 bytes_per_second=143.748Mi/s items_per_second=15.073G/s null_percent=0 size=65.536k BatchToTensorSimple<UInt8Type>/4194304 52116387 ns 39301000 ns 18 bytes_per_second=101.779Mi/s items_per_second=10.6723G/s null_percent=0 size=4.1943M BatchToTensorSimple<UInt16Type>/65536 422252 ns 421368 ns 1663 bytes_per_second=148.326Mi/s items_per_second=7.77658G/s null_percent=0 size=65.536k BatchToTensorSimple<UInt16Type>/4194304 39602325 ns 36205053 ns 19 bytes_per_second=110.482Mi/s items_per_second=5.79243G/s null_percent=0 size=4.1943M BatchToTensorSimple<UInt32Type>/65536 411546 ns 411012 ns 1696 bytes_per_second=152.064Mi/s items_per_second=3.98625G/s null_percent=0 size=65.536k BatchToTensorSimple<UInt32Type>/4194304 37668923 ns 35941842 ns 19 bytes_per_second=111.291Mi/s items_per_second=2.91742G/s null_percent=0 size=4.1943M BatchToTensorSimple<UInt64Type>/65536 409912 ns 409266 ns 1772 bytes_per_second=152.712Mi/s items_per_second=2.00163G/s null_percent=0 size=65.536k BatchToTensorSimple<UInt64Type>/4194304 40266224 ns 36517789 ns 19 bytes_per_second=109.536Mi/s items_per_second=1.43571G/s null_percent=0 size=4.1943M BatchToTensorSimple<Int8Type>/65536 404307 ns 403876 ns 1709 bytes_per_second=154.75Mi/s items_per_second=16.2268G/s null_percent=0 size=65.536k BatchToTensorSimple<Int8Type>/4194304 37406713 ns 35309316 ns 19 bytes_per_second=113.285Mi/s items_per_second=11.8787G/s null_percent=0 size=4.1943M BatchToTensorSimple<Int16Type>/65536 414663 ns 414136 ns 1649 bytes_per_second=150.916Mi/s items_per_second=7.91237G/s null_percent=0 size=65.536k BatchToTensorSimple<Int16Type>/4194304 37432355 ns 35457526 ns 19 bytes_per_second=112.811Mi/s items_per_second=5.91455G/s null_percent=0 size=4.1943M BatchToTensorSimple<Int32Type>/65536 413986 ns 413420 ns 1706 bytes_per_second=151.178Mi/s items_per_second=3.96304G/s null_percent=0 size=65.536k BatchToTensorSimple<Int32Type>/4194304 47971980 ns 37791471 ns 17 bytes_per_second=105.844Mi/s items_per_second=2.77464G/s null_percent=0 size=4.1943M BatchToTensorSimple<Int64Type>/65536 415919 ns 415559 ns 1691 bytes_per_second=150.4Mi/s items_per_second=1.97132G/s null_percent=0 size=65.536k BatchToTensorSimple<Int64Type>/4194304 36665862 ns 35319650 ns 20 bytes_per_second=113.251Mi/s items_per_second=1.48441G/s null_percent=0 size=4.1943M BatchToTensorSimple<HalfFloatType>/65536 422161 ns 421677 ns 1685 bytes_per_second=148.218Mi/s items_per_second=7.77088G/s null_percent=0 size=65.536k BatchToTensorSimple<HalfFloatType>/4194304 35648150 ns 34911650 ns 20 bytes_per_second=114.575Mi/s items_per_second=6.00703G/s null_percent=0 size=4.1943M BatchToTensorSimple<FloatType>/65536 407051 ns 406626 ns 1702 bytes_per_second=153.704Mi/s items_per_second=4.02925G/s null_percent=0 size=65.536k BatchToTensorSimple<FloatType>/4194304 35324888 ns 34521250 ns 20 bytes_per_second=115.871Mi/s items_per_second=3.03748G/s null_percent=0 size=4.1943M BatchToTensorSimple<DoubleType>/65536 411345 ns 410348 ns 1740 bytes_per_second=152.31Mi/s items_per_second=1.99635G/s null_percent=0 size=65.536k BatchToTensorSimple<DoubleType>/4194304 36834741 ns 35409211 ns 19 bytes_per_second=112.965Mi/s items_per_second=1.48065G/s null_percent=0 size=4.1943M

bytes_per_second=143.748Mi/s items_per_second=15.073G/s doesn't make sense, does it?

I guess not. What I understand, at least, is that the number for items_per_second should be approx bytes_per_second divided by the size of the type. Joris advised me what I could try to debug this but I am not finding anything I could grasp.

I am not really sure if it makes a difference if I only use state.SetBytesProcessed without state.SetItemsProcessed. It also looks OK if I just leave both of them out:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-vd706e0e/WORKSPACE/build/release/arrow-tensor-benchmark Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 15.15, 15.26, 13.19 ----------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------- BatchToTensorSimple<UInt8Type>/65536 429847 ns 429439 ns 1582 bytes_per_second=145.539Mi/s null_percent=0 size=65.536k BatchToTensorSimple<UInt8Type>/4194304 56283753 ns 44952231 ns 13 bytes_per_second=88.9833Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<UInt16Type>/65536 470726 ns 462170 ns 1607 bytes_per_second=135.232Mi/s null_percent=0 size=65.536k BatchToTensorSimple<UInt16Type>/4194304 44393589 ns 37141214 ns 14 bytes_per_second=107.697Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<UInt32Type>/65536 440997 ns 439951 ns 1260 bytes_per_second=142.061Mi/s null_percent=0 size=65.536k BatchToTensorSimple<UInt32Type>/4194304 43955912 ns 36447556 ns 18 bytes_per_second=109.747Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<UInt64Type>/65536 432952 ns 431213 ns 1369 bytes_per_second=144.94Mi/s null_percent=0 size=65.536k BatchToTensorSimple<UInt64Type>/4194304 40377762 ns 36827529 ns 17 bytes_per_second=108.614Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<Int8Type>/65536 583566 ns 561105 ns 1667 bytes_per_second=111.387Mi/s null_percent=0 size=65.536k BatchToTensorSimple<Int8Type>/4194304 69477871 ns 51189900 ns 10 bytes_per_second=78.1404Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<Int16Type>/65536 466828 ns 460938 ns 1379 bytes_per_second=135.593Mi/s null_percent=0 size=65.536k BatchToTensorSimple<Int16Type>/4194304 53699115 ns 43646833 ns 12 bytes_per_second=91.6447Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<Int32Type>/65536 510174 ns 489199 ns 1380 bytes_per_second=127.76Mi/s null_percent=0 size=65.536k BatchToTensorSimple<Int32Type>/4194304 59453215 ns 43936000 ns 13 bytes_per_second=91.0415Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<Int64Type>/65536 449931 ns 446273 ns 1581 bytes_per_second=140.049Mi/s null_percent=0 size=65.536k BatchToTensorSimple<Int64Type>/4194304 44797259 ns 38353000 ns 19 bytes_per_second=104.294Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<HalfFloatType>/65536 501073 ns 470337 ns 1660 bytes_per_second=132.884Mi/s null_percent=0 size=65.536k BatchToTensorSimple<HalfFloatType>/4194304 57234822 ns 40693467 ns 15 bytes_per_second=98.2959Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<FloatType>/65536 420881 ns 419577 ns 1389 bytes_per_second=148.96Mi/s null_percent=0 size=65.536k BatchToTensorSimple<FloatType>/4194304 41806079 ns 37133778 ns 18 bytes_per_second=107.719Mi/s null_percent=0 size=4.1943M BatchToTensorSimple<DoubleType>/65536 424610 ns 423430 ns 1346 bytes_per_second=147.604Mi/s null_percent=0 size=65.536k BatchToTensorSimple<DoubleType>/4194304 37983824 ns 35989222 ns 18 bytes_per_second=111.144Mi/s null_percent=0 size=4.1943M

It might be caused by using RegressionArgs, which also calls SetBytesProcessed in its destructor (now, if that's the case, then we have some other benchmarks reporting the wrong number as well)

Co-authored-by: Joris Van den Bossche <[email protected]>

AlenkaF · 2024-03-24T06:46:58Z

Latest output:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-1wpfanyn/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 31.29, 25.96, 16.82
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<Int8Type>/65536/3            2441 ns         2440 ns       340594 bytes_per_second=25.0118Gi/s items_per_second=26.8563G/s
BatchToTensorSimple<Int8Type>/65536/30           3158 ns         3157 ns       219021 bytes_per_second=19.3277Gi/s items_per_second=20.753G/s
BatchToTensorSimple<Int8Type>/65536/300         13722 ns        13719 ns        50473 bytes_per_second=4.43975Gi/s items_per_second=4.76715G/s
BatchToTensorSimple<Int8Type>/4194304/3        277670 ns       277521 ns         2510 bytes_per_second=14.0755Gi/s items_per_second=15.1135G/s
BatchToTensorSimple<Int8Type>/4194304/30       293289 ns       293183 ns         2430 bytes_per_second=13.3236Gi/s items_per_second=14.3061G/s
BatchToTensorSimple<Int8Type>/4194304/300      298143 ns       297779 ns         2263 bytes_per_second=13.1179Gi/s items_per_second=14.0853G/s
BatchToTensorSimple<Int16Type>/65536/3           2181 ns         2179 ns       394604 bytes_per_second=28.0054Gi/s items_per_second=15.0353G/s
BatchToTensorSimple<Int16Type>/65536/30          3247 ns         3236 ns       220372 bytes_per_second=18.8588Gi/s items_per_second=10.1247G/s
BatchToTensorSimple<Int16Type>/65536/300        14148 ns        14137 ns        46621 bytes_per_second=4.30854Gi/s items_per_second=2.31313G/s
BatchToTensorSimple<Int16Type>/4194304/3       277347 ns       277092 ns         2553 bytes_per_second=14.0973Gi/s items_per_second=7.56842G/s
BatchToTensorSimple<Int16Type>/4194304/30      370514 ns       323043 ns         2535 bytes_per_second=12.092Gi/s items_per_second=6.49187G/s
BatchToTensorSimple<Int16Type>/4194304/300     297281 ns       296810 ns         2113 bytes_per_second=13.1598Gi/s items_per_second=7.06513G/s
BatchToTensorSimple<Int32Type>/65536/3           2349 ns         2346 ns       387584 bytes_per_second=26.0117Gi/s items_per_second=6.98246G/s
BatchToTensorSimple<Int32Type>/65536/30          3163 ns         3158 ns       213616 bytes_per_second=19.3208Gi/s items_per_second=5.18638G/s
BatchToTensorSimple<Int32Type>/65536/300        13852 ns        13840 ns        49582 bytes_per_second=4.3606Gi/s items_per_second=1.17054G/s
BatchToTensorSimple<Int32Type>/4194304/3       342283 ns       319630 ns         1969 bytes_per_second=12.2212Gi/s items_per_second=3.28059G/s
BatchToTensorSimple<Int32Type>/4194304/30      290756 ns       286728 ns         2381 bytes_per_second=13.6233Gi/s items_per_second=3.65699G/s
BatchToTensorSimple<Int32Type>/4194304/300     300295 ns       297110 ns         2360 bytes_per_second=13.1465Gi/s items_per_second=3.529G/s
BatchToTensorSimple<Int64Type>/65536/3           2204 ns         2197 ns       410967 bytes_per_second=27.7705Gi/s items_per_second=3.7273G/s
BatchToTensorSimple<Int64Type>/65536/30          3176 ns         3162 ns       216236 bytes_per_second=19.3002Gi/s items_per_second=2.59043G/s
BatchToTensorSimple<Int64Type>/65536/300        13656 ns        13588 ns        51372 bytes_per_second=4.4415Gi/s items_per_second=596.128M/s
BatchToTensorSimple<Int64Type>/4194304/3       270131 ns       268433 ns         2622 bytes_per_second=14.552Gi/s items_per_second=1.95313G/s
BatchToTensorSimple<Int64Type>/4194304/30      297324 ns       292629 ns         2026 bytes_per_second=13.3486Gi/s items_per_second=1.79162G/s
BatchToTensorSimple<Int64Type>/4194304/300     293260 ns       291513 ns         2290 bytes_per_second=13.3951Gi/s items_per_second=1.79786G/s

cpp/src/arrow/tensor_benchmark.cc

pitrou

+1, just a minor suggestion

conbench-apache-arrow · 2024-03-26T14:25:56Z

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit fc87fd7.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

github-actions bot added Component: C++ awaiting review Awaiting review labels Mar 5, 2024

jorisvandenbossche reviewed Mar 7, 2024

View reviewed changes

cpp/src/arrow/to_tensor_benchmark.cc Outdated Show resolved Hide resolved

cpp/src/arrow/to_tensor_benchmark.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Mar 7, 2024

AlenkaF marked this pull request as ready for review March 11, 2024 13:05

AlenkaF marked this pull request as draft March 13, 2024 12:44

jorisvandenbossche reviewed Mar 13, 2024

View reviewed changes

cpp/src/arrow/tensor_benchmark.cc Outdated Show resolved Hide resolved

cpp/src/arrow/tensor_benchmark.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 13, 2024

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 14, 2024

AlenkaF marked this pull request as ready for review March 14, 2024 15:00

jorisvandenbossche reviewed Mar 15, 2024

View reviewed changes

jorisvandenbossche requested a review from felipecrv March 15, 2024 10:18

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Mar 15, 2024

jorisvandenbossche approved these changes Mar 15, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 15, 2024

pitrou self-requested a review March 20, 2024 13:43

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Mar 22, 2024

AlenkaF and others added 13 commits March 24, 2024 06:57

Initial commit with a very simple benchmark

2a281a8

Use templates and bigger data

e75703d

Rename benchmark file and make test case much bigger

8ce1d99

Clean up and add SetArgs

391547c

Fix linter errors

cff47a7

Update benchmark arguments

9e8a9c2

Update size

bd993f1

Update cpp/src/arrow/tensor_benchmark.cc

7581778

Co-authored-by: Joris Van den Bossche <[email protected]>

Apply suggestions from code review

c20fd1b

Co-authored-by: Joris Van den Bossche <[email protected]>

Limit benchmarks to four integer types

014e3bd

Exercise different numbers of columns

41d1218

Update cpp/src/arrow/tensor_benchmark.cc

1d5badf

Co-authored-by: Joris Van den Bossche <[email protected]>

Use num_cols and num_rows

9b5eb40

AlenkaF force-pushed the record-batch-to-tensor-benchmark branch from 8f7fc4f to 9b5eb40 Compare March 24, 2024 06:46

jorisvandenbossche approved these changes Mar 25, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 25, 2024

pitrou reviewed Mar 25, 2024

View reviewed changes

cpp/src/arrow/tensor_benchmark.cc Outdated Show resolved Hide resolved

pitrou approved these changes Mar 25, 2024

View reviewed changes

Add ArgNames

e533282

jorisvandenbossche merged commit fc87fd7 into apache:main Mar 26, 2024
34 of 35 checks passed

jorisvandenbossche removed the awaiting merge Awaiting merge label Mar 26, 2024

jorisvandenbossche mentioned this pull request Mar 26, 2024

[C++] Add benchmark for ToTensor conversions #40357

Closed

AlenkaF deleted the record-batch-to-tensor-benchmark branch March 26, 2024 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-40357: [C++] Add benchmark for ToTensor conversions #40358

GH-40357: [C++] Add benchmark for ToTensor conversions #40358

AlenkaF commented Mar 5, 2024 •

edited by github-actions bot

Loading

jorisvandenbossche commented Mar 5, 2024

AlenkaF commented Mar 6, 2024

AlenkaF commented Mar 6, 2024

AlenkaF commented Mar 13, 2024

AlenkaF commented Mar 14, 2024

jorisvandenbossche Mar 15, 2024

AlenkaF Mar 15, 2024

pitrou Mar 20, 2024

AlenkaF Mar 21, 2024

jorisvandenbossche Mar 21, 2024

AlenkaF commented Mar 24, 2024

pitrou left a comment

conbench-apache-arrow bot commented Mar 26, 2024

GH-40357: [C++] Add benchmark for ToTensor conversions #40358

GH-40357: [C++] Add benchmark for ToTensor conversions #40358

Conversation

AlenkaF commented Mar 5, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

jorisvandenbossche commented Mar 5, 2024

AlenkaF commented Mar 6, 2024

AlenkaF commented Mar 6, 2024

AlenkaF commented Mar 13, 2024

AlenkaF commented Mar 14, 2024

jorisvandenbossche Mar 15, 2024

Choose a reason for hiding this comment

AlenkaF Mar 15, 2024

Choose a reason for hiding this comment

pitrou Mar 20, 2024

Choose a reason for hiding this comment

AlenkaF Mar 21, 2024

Choose a reason for hiding this comment

jorisvandenbossche Mar 21, 2024

Choose a reason for hiding this comment

AlenkaF commented Mar 24, 2024

pitrou left a comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Mar 26, 2024

AlenkaF commented Mar 5, 2024 •

edited by github-actions bot

Loading