-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer records in row format in memory for SortExec #2146
Conversation
Hardware Settings:
A modified version of TPC-H q1:select
l_returnflag,
l_linestatus,
l_quantity,
l_extendedprice,
l_discount,
l_tax
from
lineitem
order by
l_extendedprice,
l_discount; |
TPC-H SF=1
This PR:
The row format comes with a price of more computation, with ~11% performance regression witnessed. Although this PR is showing a better cache locality, the computation cost overweight the cache benefits:
This PR:
Much better cache accessing behavior with the row format. |
TPC-H SF=10
This PR:
The performance has improved by ~21.5% this time. The advantage of better memory accessing pattern pays off the extra computation for row <-> columnar transformation.
This PR:
|
This PR looks amazing -- I can't wait to review it -- I plan to do so tomorrow. Thank you @yjshen |
Thanks @alamb! This PR acts more like a playground and testbed for the row format usage in the sort's payload. It's not mature enough but I've got some numbers we could discuss. As revealed by the benchmark above, row <-> columnar batch comes with the price of more extra CPU computations, and the cost pays off when columnar memory access is more expensive while dealing with a bigger dataset. Obviously, we need a clever/adaptive mechanism to choose which memory layout we should employ regarding cc @Dandandan @houqp @tustvold you might be interested in this as well. |
I didn't quite make it to this PR today, but I plan to do so tomorrow (I want to check it out locally and play around with it / make some flamegraphs) |
As I have mentioned, I am very interested in this work but I have not yet found time to give it the deep study it deserves (I am especially interested in the profiling). However, I have been super busy with other work items and keeping things going in IOx and DataFusion. I am not sure when I will have a chance to study this one carefully |
Marking as draft to make it clear this PR isn't waiting on review -- please feel free to mark it as ready for review if that analysis is not correct or if it is ready to review again |
Which issue does this PR close?
Closes #2151.
This PR is based on #2132.
Rationale for this change
Row format is more cache-friendly for "row logic" operators, at the cost of more computation involved while incorporating in vectorized engines like our DataFusion. (there would be extra row <-> columnar conversions of course)
This work is to explore the usage of row-format as the in-memory records buffer layout, for benchmark, test, and discussion purposes.