-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add concat to new execution model + buffered intermediate ops #2519
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2519 +/- ##
=======================================
Coverage ? 63.22%
=======================================
Files ? 972
Lines ? 108363
Branches ? 0
=======================================
Hits ? 68517
Misses ? 39846
Partials ? 0
|
pub trait IntermediateOperator: dyn_clone::DynClone + Send + Sync { | ||
fn execute(&self, input: &Arc<MicroPartition>) -> DaftResult<Arc<MicroPartition>>; | ||
fn name(&self) -> &'static str; | ||
} | ||
|
||
dyn_clone::clone_trait_object!(IntermediateOperator); | ||
|
||
/// The number of rows that will trigger an intermediate operator to output its data. | ||
pub const OUTPUT_THRESHOLD: usize = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have this as an env variable with a default
|
||
pub fn add(&mut self, part: Arc<MicroPartition>) { | ||
self.buffer.push(part); | ||
self.curr_len += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be len of part
pub const OUTPUT_THRESHOLD: usize = 1000; | ||
|
||
/// State of an operator task, used to buffer data and output it when a threshold is reached. | ||
pub struct OperatorTaskState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action needed but we should refactor this to be a trait that is serde-able so diff operators can have diff impls.
} | ||
} | ||
|
||
async fn single_operator_task( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_single_pipeline
Ok(()) | ||
} | ||
|
||
pub async fn run(&mut self) -> DaftResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_parallel
if let Some(s) = inner_task_senders.get(curr_sender_idx) { | ||
let _ = s.send(morsel).await; | ||
} else { | ||
let next_sender = self.sender.get_next_sender(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't exist so create the task runner.
let (single_sender, single_receiver) = create_single_channel(1); | ||
|
||
let op = self.op.clone(); | ||
tokio::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't have any tokio::spawn
in our code. have a util function that is called
spawn_compute(...)
tokio::spawn(async move { | ||
let _ = Self::single_operator_task(single_receiver, next_sender, op).await; | ||
}); | ||
let _ = single_sender.send(morsel).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
this error.
src/daft-local-execution/src/lib.rs
Outdated
} | ||
} | ||
|
||
pub fn blocking_recv(&mut self) -> Option<DaftResult<Arc<MicroPartition>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
Ok(Box::new(res)) | ||
} | ||
|
||
pub fn run_physical_plan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decouple, building pipeline and starting execution.
- build full pipeline
- run
start
for each op - then wait on result
This PR adds Concat and buffered intermediate ops to the new execution model, in addition to some refactors to support this change.
n
.