[WIP] ARROW-11928: [C++] Execution engine API #9742

pitrou · 2021-03-17T19:32:15Z

No description provided.

pitrou · 2021-03-17T19:35:33Z

@bkietz @wesm Here is an initial stab at the exec node API. Only base classes are present.

wesm

There are some basic things here that I don't understand, I put a few questions in my comments, so I will review some more after I see the answers

wesm · 2021-03-22T01:13:30Z

cpp/src/arrow/engine/query_plan.h

+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.


I don't think that work on query planning is anticipated soon, only work on execution, so this could be put off later.

As a nit, I would also suggest calling this "logical_plan.h" to distinguish logical from physical execution.

Ok, perhaps the naming is bad, but this is not a logical plan. This is the physical plan. "Exec plan" is a particular instantiation of executing this physical plan.

Does that make sense?

I see. How about:

query_plan.h -> physical_plan.h

exec_plan.h -> exec_node.h

wesm · 2021-03-22T01:15:06Z

cpp/src/arrow/engine/exec_plan.h

+  /// When all inputs are received for a given batch_index, the batch is ready
+  /// for execution.
+  Status InputReceived(int32_t input_index, int32_t batch_index,
+                       compute::ExecBatch batch);


What is "batch_index"?

The index of the batch in the batch stream.

I don't think this parameter is important, at least at this stage. Could you clarify what "When all inputs are received for a given batch_index, the batch is ready for execution." means? In general, inputs from different parents won't necessarily correspond to each other.

For example, suppose that we are joining a 1,000,000 row input with a 100,000 row input. One input might yield 100 batches while the other might only yield 5 or 10. They may be of differing lengths, in an unpredictable order (since scans won't necessarily yield a deterministic order), and the batches won't correspond to each other.

In general, inputs from different parents won't necessarily correspond to each other.

Ah, I see. This is a misunderstanding on my part, then. Sorry.

wesm · 2021-03-22T01:18:30Z

cpp/src/arrow/engine/exec_plan.h

+  /// This may be called before all inputs are received.  This simply fixes
+  /// the total number of incoming batches so that the ExecNode knows when
+  /// it has received all input.
+  Status InputFinished(int32_t num_batches);


In general, nodes will not know how many batches they are going to produce, so the situations where this API would be used seem rare to me

When a node has multiple inputs (e.g. joins, unions), the parent nodes will finish producing outputs at different times. How are they supposed to independently communicate that they are done?

As the docstring says, this may or may not be called before all inputs are received. But it must be called at some point so that the node knows that input is finished.

As for the second question, it seems I should change this declaration to Status InputFinished(int32_t input_index, int32_t num_batches). Does that sound right?

I would say to nix the "num_batches" argument, but otherwise yes InputFinished(input_index) sounds right

wesm · 2021-03-22T01:19:41Z

cpp/src/arrow/engine/exec_plan.h

+  /// Note that execution doesn't necessarily mean that any outputs are produced.
+  /// Depending on the ExecNode type, outputs may be produced on the fly,
+  /// or only at the end when all inputs have been received.
+  Future<> RunAsync(int32_t batch_index, internal::Executor* executor);


I guess I don't understand what "batch_index" is yet

westonpace · 2021-03-22T07:46:57Z

How does this handle fan-out style parallelism. For example, you can generally filter batch N while filtering batch N+1 at the same time. Would there be a single ExecNode that is run twice? If so won't you have trouble tracking inputs? Or would there be two ExecNode instances in the plan? In that case how would you decide which node to deliver batches to?

westonpace

I hope these questions aren't too far off base, I've only started getting up to speed on the execution plan docs.

westonpace · 2021-03-22T07:49:37Z

cpp/src/arrow/engine/query_plan.h

+class QueryContext;
+class QueryNode;
+
+class ARROW_EXPORT QueryPlan {


What is the relationship between a query plan and an execution plan? My current understanding is that a query plan is an AST of the query and the execution plan is a possibly optimized tree of workers. Is this correct?

As I answered above, "query plan" is the physical plan. "Exec plan" is the particular execution of this plan for a given set of inputs. Does that make sense? Do you want to suggest other names?

westonpace · 2021-03-22T07:53:43Z

cpp/src/arrow/engine/exec_plan.cc

+  auto* input_batch = EnsureBatch(batch_index);
+
+  // TODO lifetime (take strong ref to ExecPlan?)
+  return executor->Transfer(input_batch->ready_fut)


Would different nodes run on different executors? If not it might be simpler to avoid the transfer. Also, do I have to call RunAsync on every node for every batch index? Couldn't calling RunAsync on the root node(s?) be enough?

I'm not sure what "root nodes" are in this context. You mean the sources?

Indeed, we may simply want to start executing as soon as all inputs are ready. Though this will depend on the node: some can execute as soon as one batch is received, some need the whole input to be received.

westonpace · 2021-03-22T07:54:34Z

cpp/src/arrow/engine/exec_plan.h

+class QueryPlan;
+
+class ARROW_EXPORT ExecPlan {
+ public:


How do I actually run this? There doesn't appear to be any public interface.

Sorry, this is just a WIP PR. I decided to post it to ensure that I'm not entirely going in a bad direction.

westonpace · 2021-03-22T07:56:58Z

cpp/src/arrow/engine/exec_plan.h

+  compute::ExecContext* context() { return context_; }
+
+ protected:
+  friend class QueryPlan;


Shouldn't this be the other way around? The QueryPlan doesn't have any reference to the ExecPlan (and if my understanding of the relationship is correct it shouldn't). What value is there in this declaration?

The query plan is supposed to instantiate a corresponding exec plan when QueryPlan::MakeExecPlan is called. Presumably that will need access to non-public APIs or members, but this is all just a sketch.

pitrou · 2021-03-22T10:15:33Z

For example, you can generally filter batch N while filtering batch N+1 at the same time. Would there be a single ExecNode that is run twice? If so won't you have trouble tracking inputs?

There would be a single ExecNode that is run twice. Tracking the input is done through the batch_index (this is where you pass your N, N+1...). There is even a bit of logic implemented :-)

I don't think independent ExecNodes would work, since some operations (e.g. hash aggregation or any vector function) need access to the entire input before starting to compute their output.

wesm · 2021-03-22T12:29:03Z

I don't think independent ExecNodes would work, since some operations (e.g. hash aggregation or any vector function) need access to the entire input before starting to compute their output.

Note: Hash aggregation does not need access to its entire input to start computing, and in the first iteration of this project vector functions will be entirely disallowed.

wesm · 2021-03-22T12:47:11Z

Also keep in mind (for subsequent work) that some nodes (ie “Limit” in particular) will need to be able to apply backpressure on their parents to get them to immediately quit producing outputs. For example, if you have a “limit 1000” query, as soon as you have 1000 rows of output you shut down the ancestors.

pitrou · 2021-03-22T13:15:53Z

Note: Hash aggregation does not need access to its entire input to start computing, and in the first iteration of this project vector functions will be entirely disallowed.

Hmm, right. But still, any output depends on the entire input, so it cannot be emitted on-the-fly while receiving partial input chunks.

pitrou · 2021-03-22T13:16:16Z

For example, if you have a “limit 1000” query, as soon as you have 1000 rows of output you shut down the ancestors.

Ok, I'll have to think about that.

wesm · 2021-03-22T15:13:47Z

Hmm, right. But still, any output depends on the entire input, so it cannot be emitted on-the-fly while receiving partial input chunks.

Right, these kinds of operators (ones that have to exhaust their inputs before emitting an output) are often called "blocking" in the literature (see e.g. http://pages.cs.wisc.edu/~jignesh/publ/Quickstep.pdf or http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf).

pitrou · 2021-04-28T17:46:20Z

Note for readers: I'm working on a new PR which will supersede this.

pitrou · 2021-04-29T17:50:08Z

Closing in favour of #10204.

ARROW-11928: [C++] Execution engine API

2631943

pitrou force-pushed the ARROW-11928-engine-hierarchy branch from 66b9b60 to 2631943 Compare March 17, 2021 19:32

github-actions bot added the Component: C++ label Mar 17, 2021

wesm reviewed Mar 22, 2021

View reviewed changes

westonpace reviewed Mar 22, 2021

View reviewed changes

pitrou closed this Apr 29, 2021

pitrou deleted the ARROW-11928-engine-hierarchy branch August 26, 2021 12:04

asfimport mentioned this pull request Oct 8, 2021

[C++][Compute] Add ExecNode hierarchy #27765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] ARROW-11928: [C++] Execution engine API #9742

[WIP] ARROW-11928: [C++] Execution engine API #9742

pitrou commented Mar 17, 2021

pitrou commented Mar 17, 2021

wesm left a comment

wesm Mar 22, 2021

pitrou Mar 22, 2021

wesm Mar 22, 2021

wesm Mar 22, 2021

pitrou Mar 22, 2021

wesm Mar 22, 2021

pitrou Mar 22, 2021

wesm Mar 22, 2021

pitrou Mar 22, 2021

pitrou Mar 22, 2021

wesm Mar 22, 2021

wesm Mar 22, 2021

westonpace commented Mar 22, 2021

westonpace left a comment

westonpace Mar 22, 2021

pitrou Mar 22, 2021

westonpace Mar 22, 2021

pitrou Mar 22, 2021

westonpace Mar 22, 2021

pitrou Mar 22, 2021

westonpace Mar 22, 2021

pitrou Mar 22, 2021

pitrou commented Mar 22, 2021

wesm commented Mar 22, 2021

wesm commented Mar 22, 2021

pitrou commented Mar 22, 2021

pitrou commented Mar 22, 2021

wesm commented Mar 22, 2021

pitrou commented Apr 28, 2021 •

edited

Loading

pitrou commented Apr 29, 2021

[WIP] ARROW-11928: [C++] Execution engine API #9742

[WIP] ARROW-11928: [C++] Execution engine API #9742

Conversation

pitrou commented Mar 17, 2021

pitrou commented Mar 17, 2021

wesm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonpace commented Mar 22, 2021

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Mar 22, 2021

wesm commented Mar 22, 2021

wesm commented Mar 22, 2021

pitrou commented Mar 22, 2021

pitrou commented Mar 22, 2021

wesm commented Mar 22, 2021

pitrou commented Apr 28, 2021 • edited Loading

pitrou commented Apr 29, 2021

pitrou commented Apr 28, 2021 •

edited

Loading