Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40357: [C++] Add benchmark for ToTensor conversions #40358

Merged

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Mar 5, 2024

Rationale for this change

We should add benchmarks to be sure not to cause regressions while working on additional implementations of RecordBatch::ToTensor and Table::ToTensor.

What changes are included in this PR?

New cpp/src/arrow/to_tensor_benchmark.cc file.

@jorisvandenbossche
Copy link
Member

Can you show the result of running them? And we might want to use some more data to get a more reliable result?

@AlenkaF
Copy link
Member Author

AlenkaF commented Mar 6, 2024

This was the result output:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-ahcnq1ah/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 17.32, 18.72, 16.18
----------------------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------
RecordBatchUniformTypesSimple        624 ns          624 ns      1125492 bytes_per_second=1.29039Gi/s items_per_second=43.2982M/s

WIll use RandomArrayGenerator to generate more data and add the result here.

@AlenkaF
Copy link
Member Author

AlenkaF commented Mar 6, 2024

The result from running archery benchmark diff --benchmark-filter=BatchToTensorSimple on the second commit (but with arrays of length 100, not 500):

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-jun4cokj/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 24.95, 25.04, 19.14
---------------------------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>            550 ns          550 ns      1254345 bytes_per_second=4.06699Gi/s items_per_second=545.863M/s
BatchToTensorSimple<UInt16Type>           555 ns          553 ns      1235570 bytes_per_second=8.08251Gi/s items_per_second=542.408M/s
BatchToTensorSimple<UInt32Type>           569 ns          568 ns      1253335 bytes_per_second=15.7341Gi/s items_per_second=527.949M/s
BatchToTensorSimple<UInt64Type>           580 ns          580 ns      1237449 bytes_per_second=30.8253Gi/s items_per_second=517.163M/s
BatchToTensorSimple<Int8Type>             548 ns          548 ns      1249732 bytes_per_second=4.07944Gi/s items_per_second=547.533M/s
BatchToTensorSimple<Int16Type>            623 ns          568 ns      1233654 bytes_per_second=7.87246Gi/s items_per_second=528.312M/s
BatchToTensorSimple<Int32Type>            565 ns          564 ns      1204923 bytes_per_second=15.8461Gi/s items_per_second=531.706M/s
BatchToTensorSimple<Int64Type>            585 ns          585 ns      1269059 bytes_per_second=30.5699Gi/s items_per_second=512.878M/s
BatchToTensorSimple<HalfFloatType>        545 ns          544 ns      1217900 bytes_per_second=8.21219Gi/s items_per_second=551.111M/s
BatchToTensorSimple<FloatType>            575 ns          574 ns      1239991 bytes_per_second=15.5835Gi/s items_per_second=522.896M/s
BatchToTensorSimple<DoubleType>           567 ns          566 ns      1152074 bytes_per_second=31.5943Gi/s items_per_second=530.065M/s

cpp/src/arrow/to_tensor_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/to_tensor_benchmark.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Mar 7, 2024
@AlenkaF AlenkaF marked this pull request as ready for review March 11, 2024 13:05
@AlenkaF AlenkaF marked this pull request as draft March 13, 2024 12:44
@AlenkaF
Copy link
Member Author

AlenkaF commented Mar 13, 2024

Current output when running archery benchmark diff --benchmark-filter=BatchToTensorSimple:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-e8lvkw1g/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 27.50, 28.87, 23.74
-----------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536/10000             4121 us         4107 us          171 bytes_per_second=15.217Mi/s items_per_second=12.765G/s null_percent=0.01 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/100               4273 us         4219 us          170 bytes_per_second=14.8143Mi/s items_per_second=12.4271G/s null_percent=1 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/10                4019 us         4003 us          173 bytes_per_second=15.6149Mi/s items_per_second=13.0988G/s null_percent=10 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/2                 4100 us         4083 us          136 bytes_per_second=15.3084Mi/s items_per_second=12.8416G/s null_percent=50 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/1                 3972 us         3894 us          178 bytes_per_second=16.0516Mi/s items_per_second=13.465G/s null_percent=100 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/0                 3953 us         3927 us          178 bytes_per_second=15.9142Mi/s items_per_second=13.3498G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304/10000       15398661 us      1947088 us            1 bytes_per_second=2.05435Mi/s items_per_second=1.72331G/s null_percent=0.01 size=4.1943M
.
.
.

cpp/src/arrow/tensor_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/tensor_benchmark.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 13, 2024
@AlenkaF
Copy link
Member Author

AlenkaF commented Mar 14, 2024

Output from running the benchmarks on the latest commit:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-y9o8zv4d/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 20.67, 17.39, 10.95
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536           443099 ns       442863 ns         1580 bytes_per_second=141.127Mi/s items_per_second=14.7983G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304       38391076 ns     35795222 ns           18 bytes_per_second=111.747Mi/s items_per_second=11.7175G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt16Type>/65536          882040 ns       881129 ns          747 bytes_per_second=70.9318Mi/s items_per_second=7.43773G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt16Type>/4194304     118462838 ns     81059222 ns            9 bytes_per_second=49.3466Mi/s items_per_second=5.17437G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt32Type>/65536         1937139 ns      1933673 ns          361 bytes_per_second=32.3219Mi/s items_per_second=3.3892G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt32Type>/4194304    1271556625 ns    651396000 ns            1 bytes_per_second=6.14066Mi/s items_per_second=643.895M/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt64Type>/65536         4440503 ns      4344614 ns          166 bytes_per_second=14.3856Mi/s items_per_second=1.50844G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt64Type>/4194304    1.1486e+10 ns   1742537000 ns            1 bytes_per_second=2.2955Mi/s items_per_second=240.701M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int8Type>/65536            415187 ns       410957 ns         1710 bytes_per_second=152.084Mi/s items_per_second=15.9472G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int8Type>/4194304        34241740 ns     33962150 ns           20 bytes_per_second=117.778Mi/s items_per_second=12.3499G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int16Type>/65536           812298 ns       810349 ns          917 bytes_per_second=77.1273Mi/s items_per_second=8.08738G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int16Type>/4194304       75301182 ns     70352375 ns            8 bytes_per_second=56.8566Mi/s items_per_second=5.96185G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int32Type>/65536          2033466 ns      2026663 ns          329 bytes_per_second=30.8389Mi/s items_per_second=3.23369G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int32Type>/4194304     1233238541 ns    562396000 ns            1 bytes_per_second=7.11243Mi/s items_per_second=745.792M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int64Type>/65536          3969188 ns      3959770 ns          178 bytes_per_second=15.7837Mi/s items_per_second=1.65505G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int64Type>/4194304     1.5188e+10 ns   1823171000 ns            1 bytes_per_second=2.19398Mi/s items_per_second=230.055M/s null_percent=0 size=4.1943M
BatchToTensorSimple<HalfFloatType>/65536       899771 ns       888509 ns          749 bytes_per_second=70.3426Mi/s items_per_second=7.37595G/s null_percent=0 size=65.536k
BatchToTensorSimple<HalfFloatType>/4194304   71104797 ns     69327375 ns            8 bytes_per_second=57.6973Mi/s items_per_second=6.05G/s null_percent=0 size=4.1943M
BatchToTensorSimple<FloatType>/65536          2025175 ns      2021084 ns          347 bytes_per_second=30.924Mi/s items_per_second=3.24262G/s null_percent=0 size=65.536k
BatchToTensorSimple<FloatType>/4194304     1087905188 ns    395840500 ns            2 bytes_per_second=10.1051Mi/s items_per_second=1.05959G/s null_percent=0 size=4.1943M
BatchToTensorSimple<DoubleType>/65536         4118269 ns      4089947 ns          170 bytes_per_second=15.2814Mi/s items_per_second=1.60237G/s null_percent=0 size=65.536k
BatchToTensorSimple<DoubleType>/4194304    9901101750 ns   1684713000 ns            1 bytes_per_second=2.37429Mi/s items_per_second=248.963M/s null_percent=0 size=4.1943M

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 14, 2024
@AlenkaF AlenkaF marked this pull request as ready for review March 14, 2024 15:00
RegressionArgs args(state);
std::shared_ptr<DataType> ty = TypeTraits<ValueType>::type_singleton();

const int64_t kNumRows = args.size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe still do the / 8 ( or division by sizeof(CType)), because the reported "Time" of some of benchmarks is still in the > second range

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New result, dividing size by sizeof(CType):

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-wyoew3d4/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 19.13, 18.37, 12.07
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536           440965 ns       434790 ns         1628 bytes_per_second=143.748Mi/s items_per_second=15.073G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304       52116387 ns     39301000 ns           18 bytes_per_second=101.779Mi/s items_per_second=10.6723G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt16Type>/65536          422252 ns       421368 ns         1663 bytes_per_second=148.326Mi/s items_per_second=7.77658G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt16Type>/4194304      39602325 ns     36205053 ns           19 bytes_per_second=110.482Mi/s items_per_second=5.79243G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt32Type>/65536          411546 ns       411012 ns         1696 bytes_per_second=152.064Mi/s items_per_second=3.98625G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt32Type>/4194304      37668923 ns     35941842 ns           19 bytes_per_second=111.291Mi/s items_per_second=2.91742G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt64Type>/65536          409912 ns       409266 ns         1772 bytes_per_second=152.712Mi/s items_per_second=2.00163G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt64Type>/4194304      40266224 ns     36517789 ns           19 bytes_per_second=109.536Mi/s items_per_second=1.43571G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int8Type>/65536            404307 ns       403876 ns         1709 bytes_per_second=154.75Mi/s items_per_second=16.2268G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int8Type>/4194304        37406713 ns     35309316 ns           19 bytes_per_second=113.285Mi/s items_per_second=11.8787G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int16Type>/65536           414663 ns       414136 ns         1649 bytes_per_second=150.916Mi/s items_per_second=7.91237G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int16Type>/4194304       37432355 ns     35457526 ns           19 bytes_per_second=112.811Mi/s items_per_second=5.91455G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int32Type>/65536           413986 ns       413420 ns         1706 bytes_per_second=151.178Mi/s items_per_second=3.96304G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int32Type>/4194304       47971980 ns     37791471 ns           17 bytes_per_second=105.844Mi/s items_per_second=2.77464G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int64Type>/65536           415919 ns       415559 ns         1691 bytes_per_second=150.4Mi/s items_per_second=1.97132G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int64Type>/4194304       36665862 ns     35319650 ns           20 bytes_per_second=113.251Mi/s items_per_second=1.48441G/s null_percent=0 size=4.1943M
BatchToTensorSimple<HalfFloatType>/65536       422161 ns       421677 ns         1685 bytes_per_second=148.218Mi/s items_per_second=7.77088G/s null_percent=0 size=65.536k
BatchToTensorSimple<HalfFloatType>/4194304   35648150 ns     34911650 ns           20 bytes_per_second=114.575Mi/s items_per_second=6.00703G/s null_percent=0 size=4.1943M
BatchToTensorSimple<FloatType>/65536           407051 ns       406626 ns         1702 bytes_per_second=153.704Mi/s items_per_second=4.02925G/s null_percent=0 size=65.536k
BatchToTensorSimple<FloatType>/4194304       35324888 ns     34521250 ns           20 bytes_per_second=115.871Mi/s items_per_second=3.03748G/s null_percent=0 size=4.1943M
BatchToTensorSimple<DoubleType>/65536          411345 ns       410348 ns         1740 bytes_per_second=152.31Mi/s items_per_second=1.99635G/s null_percent=0 size=65.536k
BatchToTensorSimple<DoubleType>/4194304      36834741 ns     35409211 ns           19 bytes_per_second=112.965Mi/s items_per_second=1.48065G/s null_percent=0 size=4.1943M

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes_per_second=143.748Mi/s items_per_second=15.073G/s doesn't make sense, does it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. What I understand, at least, is that the number for items_per_second should be approx bytes_per_second divided by the size of the type. Joris advised me what I could try to debug this but I am not finding anything I could grasp.

I am not really sure if it makes a difference if I only use state.SetBytesProcessed without state.SetItemsProcessed. It also looks OK if I just leave both of them out:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-vd706e0e/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 15.15, 15.26, 13.19
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536           429847 ns       429439 ns         1582 bytes_per_second=145.539Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304       56283753 ns     44952231 ns           13 bytes_per_second=88.9833Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt16Type>/65536          470726 ns       462170 ns         1607 bytes_per_second=135.232Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt16Type>/4194304      44393589 ns     37141214 ns           14 bytes_per_second=107.697Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt32Type>/65536          440997 ns       439951 ns         1260 bytes_per_second=142.061Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt32Type>/4194304      43955912 ns     36447556 ns           18 bytes_per_second=109.747Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt64Type>/65536          432952 ns       431213 ns         1369 bytes_per_second=144.94Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt64Type>/4194304      40377762 ns     36827529 ns           17 bytes_per_second=108.614Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int8Type>/65536            583566 ns       561105 ns         1667 bytes_per_second=111.387Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<Int8Type>/4194304        69477871 ns     51189900 ns           10 bytes_per_second=78.1404Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int16Type>/65536           466828 ns       460938 ns         1379 bytes_per_second=135.593Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<Int16Type>/4194304       53699115 ns     43646833 ns           12 bytes_per_second=91.6447Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int32Type>/65536           510174 ns       489199 ns         1380 bytes_per_second=127.76Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<Int32Type>/4194304       59453215 ns     43936000 ns           13 bytes_per_second=91.0415Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int64Type>/65536           449931 ns       446273 ns         1581 bytes_per_second=140.049Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<Int64Type>/4194304       44797259 ns     38353000 ns           19 bytes_per_second=104.294Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<HalfFloatType>/65536       501073 ns       470337 ns         1660 bytes_per_second=132.884Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<HalfFloatType>/4194304   57234822 ns     40693467 ns           15 bytes_per_second=98.2959Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<FloatType>/65536           420881 ns       419577 ns         1389 bytes_per_second=148.96Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<FloatType>/4194304       41806079 ns     37133778 ns           18 bytes_per_second=107.719Mi/s null_percent=0 size=4.1943M
BatchToTensorSimple<DoubleType>/65536          424610 ns       423430 ns         1346 bytes_per_second=147.604Mi/s null_percent=0 size=65.536k
BatchToTensorSimple<DoubleType>/4194304      37983824 ns     35989222 ns           18 bytes_per_second=111.144Mi/s null_percent=0 size=4.1943M

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be caused by using RegressionArgs, which also calls SetBytesProcessed in its destructor (now, if that's the case, then we have some other benchmarks reporting the wrong number as well)

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Mar 15, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 15, 2024
@pitrou pitrou self-requested a review March 20, 2024 13:43
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Mar 22, 2024
@AlenkaF AlenkaF force-pushed the record-batch-to-tensor-benchmark branch from 8f7fc4f to 9b5eb40 Compare March 24, 2024 06:46
@AlenkaF
Copy link
Member Author

AlenkaF commented Mar 24, 2024

Latest output:

Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-1wpfanyn/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 31.29, 25.96, 16.82
-----------------------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<Int8Type>/65536/3            2441 ns         2440 ns       340594 bytes_per_second=25.0118Gi/s items_per_second=26.8563G/s
BatchToTensorSimple<Int8Type>/65536/30           3158 ns         3157 ns       219021 bytes_per_second=19.3277Gi/s items_per_second=20.753G/s
BatchToTensorSimple<Int8Type>/65536/300         13722 ns        13719 ns        50473 bytes_per_second=4.43975Gi/s items_per_second=4.76715G/s
BatchToTensorSimple<Int8Type>/4194304/3        277670 ns       277521 ns         2510 bytes_per_second=14.0755Gi/s items_per_second=15.1135G/s
BatchToTensorSimple<Int8Type>/4194304/30       293289 ns       293183 ns         2430 bytes_per_second=13.3236Gi/s items_per_second=14.3061G/s
BatchToTensorSimple<Int8Type>/4194304/300      298143 ns       297779 ns         2263 bytes_per_second=13.1179Gi/s items_per_second=14.0853G/s
BatchToTensorSimple<Int16Type>/65536/3           2181 ns         2179 ns       394604 bytes_per_second=28.0054Gi/s items_per_second=15.0353G/s
BatchToTensorSimple<Int16Type>/65536/30          3247 ns         3236 ns       220372 bytes_per_second=18.8588Gi/s items_per_second=10.1247G/s
BatchToTensorSimple<Int16Type>/65536/300        14148 ns        14137 ns        46621 bytes_per_second=4.30854Gi/s items_per_second=2.31313G/s
BatchToTensorSimple<Int16Type>/4194304/3       277347 ns       277092 ns         2553 bytes_per_second=14.0973Gi/s items_per_second=7.56842G/s
BatchToTensorSimple<Int16Type>/4194304/30      370514 ns       323043 ns         2535 bytes_per_second=12.092Gi/s items_per_second=6.49187G/s
BatchToTensorSimple<Int16Type>/4194304/300     297281 ns       296810 ns         2113 bytes_per_second=13.1598Gi/s items_per_second=7.06513G/s
BatchToTensorSimple<Int32Type>/65536/3           2349 ns         2346 ns       387584 bytes_per_second=26.0117Gi/s items_per_second=6.98246G/s
BatchToTensorSimple<Int32Type>/65536/30          3163 ns         3158 ns       213616 bytes_per_second=19.3208Gi/s items_per_second=5.18638G/s
BatchToTensorSimple<Int32Type>/65536/300        13852 ns        13840 ns        49582 bytes_per_second=4.3606Gi/s items_per_second=1.17054G/s
BatchToTensorSimple<Int32Type>/4194304/3       342283 ns       319630 ns         1969 bytes_per_second=12.2212Gi/s items_per_second=3.28059G/s
BatchToTensorSimple<Int32Type>/4194304/30      290756 ns       286728 ns         2381 bytes_per_second=13.6233Gi/s items_per_second=3.65699G/s
BatchToTensorSimple<Int32Type>/4194304/300     300295 ns       297110 ns         2360 bytes_per_second=13.1465Gi/s items_per_second=3.529G/s
BatchToTensorSimple<Int64Type>/65536/3           2204 ns         2197 ns       410967 bytes_per_second=27.7705Gi/s items_per_second=3.7273G/s
BatchToTensorSimple<Int64Type>/65536/30          3176 ns         3162 ns       216236 bytes_per_second=19.3002Gi/s items_per_second=2.59043G/s
BatchToTensorSimple<Int64Type>/65536/300        13656 ns        13588 ns        51372 bytes_per_second=4.4415Gi/s items_per_second=596.128M/s
BatchToTensorSimple<Int64Type>/4194304/3       270131 ns       268433 ns         2622 bytes_per_second=14.552Gi/s items_per_second=1.95313G/s
BatchToTensorSimple<Int64Type>/4194304/30      297324 ns       292629 ns         2026 bytes_per_second=13.3486Gi/s items_per_second=1.79162G/s
BatchToTensorSimple<Int64Type>/4194304/300     293260 ns       291513 ns         2290 bytes_per_second=13.3951Gi/s items_per_second=1.79786G/s

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 25, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, just a minor suggestion

@jorisvandenbossche jorisvandenbossche merged commit fc87fd7 into apache:main Mar 26, 2024
34 of 35 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting merge Awaiting merge label Mar 26, 2024
@AlenkaF AlenkaF deleted the record-batch-to-tensor-benchmark branch March 26, 2024 08:01
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit fc87fd7.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants