You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This will serve as a sink ExecNode which dumps all the batches it receives to disk. The PR should probably also replace FileSystemDataset::Write with an ExecPlan based implementation
Weston Pace / @westonpace:
Not this week but if it's still open next week I'll take it. I'm going to assign it to myself but feel free to steal it if you get to it before me (I'll mark it "In Progress" when I actually start working on it)
Ben Kietzman / @bkietz:
Currently I was thinking that partitioning would be handled within this node, since that'd be the most straightforward extraction of a node from FileSystemDataset::Write.
If you wanted to extract a compute::PartitionNode instead, that'd probably be useful later on. I think PartitionNode would:
use a Grouper for id-ing their destination partition
sort batches by their partition id
emit slices of input batches with equal partition id
the partition expression is stored in ExecBatch::guarantee
(note: does not utilize a dataset::Partitioning)
Then WriteNode would only use a Partitioning to format ExecBatch::guarantees to an output directory. I think this approach would allow us to delete Partitioning::Partition too, since that behavior would now be encapsulated by PartitionNode.
Also note that whatever approach you take is going to impinge on ARROW-13338 since ExecPlans don't support sync scanning and FileSystemDataset::Write depends on [[deprecated]] Scanner::Scan
This will serve as a sink ExecNode which dumps all the batches it receives to disk. The PR should probably also replace
FileSystemDataset::Write
with an ExecPlan based implementationReporter: Ben Kietzman / @bkietz
Assignee: Weston Pace / @westonpace
Subtasks:
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-13542. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: