Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute plans during plan tests to make sure plans are valid and executable. #8230

Open
mustafasrepo opened this issue Nov 16, 2023 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@mustafasrepo
Copy link
Contributor

mustafasrepo commented Nov 16, 2023

Is your feature request related to a problem or challenge?

In the link, @alamb suggested that adding a run pass to the plans would be better to make sure that plan is indeed runnable. I think this feature is really nice. We can add this support to the tests under following files:

  • replace_with_order_preserving_variants.rs
  • enforce_distirbution.rs
  • enforce_sorting.rs

Describe the solution you'd like

I think, we can execute optimized ExecutionPlans, inside assert_optimized! macro after checking that plan is as expected. After collecting all the result (preferably with single row) with collect(plan, task_ctx).await? we would expect success at the output. If there is any inconsistency in the schema, execution will show errors.

I think for this task we can use EmptyExec with single row at the source.

Describe alternatives you've considered

No response

Additional context

No response

@mustafasrepo mustafasrepo added enhancement New feature or request good first issue Good for newcomers labels Nov 16, 2023
@Lordworms
Copy link
Contributor

May I take a try at it? Since I am new here, maybe one week should be enough.

@mustafasrepo
Copy link
Contributor Author

May I take a try at it? Since I am new here, maybe one week should be enough.

Sure.

@findepi
Copy link
Member

findepi commented Jul 3, 2024

I tried to do this by appending the following at the end of assert_optimized definition in replace_with_order_preserving_variants.rs:

@@ -399,10 +405,17 @@ mod tests {
             let actual = get_plan_string(&optimized_physical_plan);
             assert_eq!(
                 expected_optimized_lines, actual,
                 "\n**Optimized Plan Mismatch\n\nexpected:\n\n{expected_optimized_lines:#?}\nactual:\n\n{actual:#?}\n\n"
             );
+
+            let ctx = SessionContext::new();
+            let object_store = InMemory::new();
+            object_store.put(&object_store::path::Path::from("file_path"), bytes::Bytes::from("").into()).await?;
+            ctx.register_object_store(&Url::parse("test://").unwrap(), Arc::new(object_store));
+            let task_ctx = Arc::new(TaskContext::from(&ctx));
+            collect(optimized_physical_plan, task_ctx).await?;
         };
     }

but this leads to internal error: entered unreachable code errors from


and partition not used yet errors from
.expect("partition not used yet");

Clearly the context setup is not sufficient to execute the plans. @mustafasrepo would you be willing to give me a hand and suggest what I am doing wrong?

@mustafasrepo
Copy link
Contributor Author

I think, the reason for these bugs is that. Source doesn't emit any data (Or doesn't expect to be called.). I think, we might need to update sources to support generating data. For doing so we need to update
csv_exec_sorted and stream_exec_ordered implementations to generate sources that can emit data. One possible approach would be updating csv_exec_sorted such that it generates a MemoryExec with single row (Method could be renamed to memory_exec_sorted. The reason I propose MemoryExec is that it is easier to give data during initialization for this operator).

However, corresponding change might be a bit tricky for StreamingTableExec since its stream is mock and doesn't support any data generation. Hence I propose to execute tests with actual data for just bounded cases (e.g. cases where source is constructed with csv_exec_sorted call.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants