-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2104: [C++] take kernel functions for nested types #4531
ARROW-2104: [C++] take kernel functions for nested types #4531
Conversation
50c77f8
to
f7bd12b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is neat! Some comments below.
} | ||
|
||
static Status UnsafeAppend(BinaryBuilder* builder, util::string_view value) { | ||
RETURN_NOT_OK(builder->ReserveData(static_cast<int64_t>(value.size()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... it would probably be more efficient to do it in two passes:
- first pass to compute the overall data length
- second pass to do the actual builder appends after having reserved the right data size at once
What do you think? Is it easily doable to switch to such a scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a problem. I'll write a benchmark so we can watch the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou as it turns out, it's actually a little faster to ReserveData for each string:
# ReserveData on append
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------
FilterString/1048576/1/min_time:1.000 1586715 ns 1586705 ns 884 null_percent=1 size=1048.58k 630.237MB/s
FilterString/8388608/1/min_time:1.000 23555110 ns 23555006 ns 72 null_percent=1 size=8.38861M 339.631MB/s
# separate ReserveData pass
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------
FilterString/1048576/1/min_time:1.000 1656093 ns 1654443 ns 841 null_percent=1 size=1048.58k 604.433MB/s
FilterString/8388608/1/min_time:1.000 19849301 ns 19809235 ns 71 null_percent=1 size=8.38861M 403.852MB/s
Would you like me to push this change so you can inspect it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's faster on the 1MB case, but slower on the 8MB case. I expect it to become even slower on larger data. Can you try with e.g. 32MB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(what is the size of your L3 cache?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: we discussed and investigated it a bit with @bkietz . It turns out that the dominating contributor to Filter performance is probably reading the selection bitmap, so iterating twice on the input instead of once has a negative impact.
The benchmarks for Filter with this refactor show a slight improvement over previous:
|
static void FilterString(benchmark::State& state) { | ||
RegressionArgs args(state); | ||
|
||
const int64_t array_size = args.size / sizeof(int64_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem right to divide by int64_t
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fsaintjacques could you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to make the benchmark share a memory budget (args.size) and let them size the input. The goal is to be able to compare operations. So if you don't input any physical sizes and just input length, no need to divide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So the problem here is to generate a random string array roughly consistent with the given memory budget.
CI failures seem spurious: |
a4bfaac
to
73ab241
Compare
Rebased to fix unrelated CI failures. |
@bkietz Wow, not sure how that happened :-/ Sorry for that. |
No worries, easily fixed |
73ab241
to
f16a047
Compare
Spurious Appveyor failure while downloading R installer |
5e2c9eb
to
06a6889
Compare
There was an idiosyncratic Maven failure. I restarted the one failing Travis job |
Appveyor failed due to the ongoing cloud.r-project.org DNS issue. We should probably add this test to "allow_failures" until that is resolved |
@wesm anything I need to do to accomplish that? |
AMD64 Ubuntu 18.04 C++ Benchmark (#27346) builder has been succeeded. Revision: 73262bd =================================== =========== =========== ==========
benchmark baseline contender change
=================================== =========== =========== ==========
FilterString/32768/1/min_time:1.000 4.33183e+09 4.25759e+09 -0.0171373
=================================== =========== =========== ========== |
@ursabot benchmark --benchmark-filter=TakeString HEAD^1 |
AMD64 Ubuntu 18.04 C++ Benchmark (#27347) builder failed. Revision: 73262bd Archery:
|
@ursabot benchmark --help |
|
@ursabot benchmark --benchmark-filter=TakeString HEAD~2 |
AMD64 Ubuntu 18.04 C++ Benchmark (#27353) builder failed with an exception. Revision: 73262bd Archery: Traceback (most recent call last):
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
_inlineCallbacks(r, g, status)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
--- <exception caught here> ---
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
self.results = yield self.run()
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/ursabot/ursabot/ursabot/steps.py", line 43, in run
await log.addContent(content)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
return self.lbf.append(text)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
text = self.newline_re.sub('\n', text)
builtins.TypeError: expected string or bytes-like object |
@ursabot benchmark --benchmark-filter=TakeString HEAD~1 |
AMD64 Ubuntu 18.04 C++ Benchmark (#27354) builder failed with an exception. Revision: 73262bd Archery: Traceback (most recent call last):
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
_inlineCallbacks(r, g, status)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
--- <exception caught here> ---
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
self.results = yield self.run()
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/ursabot/ursabot/ursabot/steps.py", line 43, in run
await log.addContent(content)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
return self.lbf.append(text)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
text = self.newline_re.sub('\n', text)
builtins.TypeError: expected string or bytes-like object |
@ursabot benchmark --help |
|
@ursabot benchmark --help |
|
@ursabot benchmark --help |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#27385) builder has been succeeded. Revision: 73262bd =================================== =========== =========== ===========
benchmark baseline contender change
=================================== =========== =========== ===========
TakeString/32768/50/min_time:1.000 1.33361e+09 1.27811e+09 -0.0416122
TakeString/32768/1/min_time:1.000 1.54006e+09 1.54192e+09 0.00120504
TakeString/8388608/1/min_time:1.000 7.39268e+08 8.28827e+08 0.121145
TakeString/32768/0/min_time:1.000 1.74939e+09 1.73941e+09 -0.00570733
TakeString/32768/10/min_time:1.000 1.52927e+09 1.51239e+09 -0.0110394
TakeString/1048576/1/min_time:1.000 1.37075e+09 1.36925e+09 -0.00109342
=================================== =========== =========== =========== |
@wesm @pitrou the only remaining CI failure is an abort in test_flight.py |
Follow up JIRA for optimization of these kernels:https://issues.apache.org/jira/browse/ARROW-5760 |
can we fix the hideous benchmark formatting to show human readable timings? |
Not with a change to Take(), I think |
Sorry that was a general comment to @fsaintjacques =) I restarted the failed Travis build to see if it's transient https://travis-ci.org/apache/arrow/jobs/551366263 definitely concerning that Flight is able to crash like that |
@wesm actively working to this next. See voltrondata-labs/ursabot#98 |
Codecov Report
@@ Coverage Diff @@
## master #4531 +/- ##
=========================================
Coverage ? 88.87%
=========================================
Files ? 717
Lines ? 98750
Branches ? 0
=========================================
Hits ? 87768
Misses ? 10982
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Take now supports gathering from List, FixedSizeList, Map, and Struct arrays. Union is not yet supported