Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify sort streams #2296

Merged
merged 1 commit into from
Apr 21, 2022
Merged

Simplify sort streams #2296

merged 1 commit into from
Apr 21, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #2201

Rationale for this change

In preparation for making tokio an optional dependency of SortPreservingMerge, this PR shuffles around some of the stream plumbing.

What changes are included in this PR?

  • Changes spawn_execution to use tokio::mpsc for consistency with RecordBatchReceiverStream
  • Removes StreamWrapper indirection

Are there any user-facing changes?

No

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 20, 2022
receiver,
join_handle,
),
0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think specifying 0 here is ok, but perhaps @yjshen could confirm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This field is mem_used for SortedStream. This 0 is consistent with what we report for Receivers. We currently do not count in memory usage for a single batch for all streams' next().

@@ -269,9 +284,6 @@ pub(crate) struct SortPreservingMergeStream {
/// The sorted input streams to merge together
streams: MergingStreams,

/// Drop helper for tasks feeding the input [`streams`](Self::streams)
_drop_helper: AbortOnDropMany<()>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need this as it is handled by the individual RecordBatchReceiverStream

use futures::Stream;
use tokio::sync::mpsc;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated because of the change to spawn_execution

@@ -180,26 +179,23 @@ impl ExecutionPlan for CoalescePartitionsExec {
}
}

pin_project! {
struct MergeStream {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by cleanup, this isn't necessary as the types are unpin

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nice cleanup to me

cc @yjshen

.into_iter()
.map(|s| StreamWrapper::Stream(Some(s)))
.collect();
let wrappers = streams.into_iter().map(|s| s.stream.fuse()).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fuse the magic that avoids the need for the stream wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the combination of this and using RecordBatchReceiverStream to convert the mpsc to a SendableRecordBatchStream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecordBatchReceiverStream is the thing I was missing

receiver,
join_handle,
),
0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This field is mem_used for SortedStream. This 0 is consistent with what we report for Receivers. We currently do not count in memory usage for a single batch for all streams' next().

@yjshen yjshen merged commit a79f332 into apache:master Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants