Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: basic support for executing prepared statements #13242

Merged
merged 11 commits into from
Nov 7, 2024

Conversation

jonahgao
Copy link
Member

@jonahgao jonahgao commented Nov 4, 2024

Which issue does this PR close?

Closes #4549.

Rationale for this change

Store the logical plan of the prepared statement in the session state. During EXECUTE, retrieve the prepared logical plan, replace the placeholders with execution parameters, and then execute.

What changes are included in this PR?

Support executing prepared statements.

DataFusion CLI v42.2.0
> prepare p(int) as select 100+$1;
0 row(s) fetched.
Elapsed 0.004 seconds.

> execute p(10);
+-----------------+
| Int64(100) + $1 |
+-----------------+
| 110             |
+-----------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes.

ctx.sql("Prepare p(int) as query") now returns an empty DataFrame, similar to Create View and other DDL statements. Previously, it would return a DataFrame containing the Prepare logical plan.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 4, 2024
@@ -176,27 +175,6 @@ async fn prepared_statement_type_coercion() -> Result<()> {
Ok(())
}

#[tokio::test]
async fn prepared_statement_invalid_types() -> Result<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this test because now parameters will be cast to the target type.
This is also the behavior of PostgreSQL.

psql=> prepare p(int) as select 100+$1;
PREPARE

psql=> execute p('100');
 ?column?
----------
      200
(1 row)

psql=> execute p(20.12);
 ?column?
----------
      120
(1 row)

@@ -106,9 +105,9 @@ async fn test_prepare_statement() -> Result<()> {
let ctx = create_ctx_with_partition(&tmp_dir, partition_count).await?;

// sql to statement then to prepare logical plan with parameters
// c1 defined as UINT32, c2 defined as UInt64 but the params are Int32 and Float64
let dataframe =
ctx.sql("PREPARE my_plan(INT, DOUBLE) AS SELECT c1, c2 FROM test WHERE c1 > $2 AND c1 < $1").await?;
Copy link
Member Author

@jonahgao jonahgao Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREPARE now returns an empty DataFrame and can't call with_param_values on it. I think this change is reasonable because the result of PREPARE is not a real relation, as it doesn't contain valid rows. Its result is more like DDL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- PREPARE is a statement in my mind (and has no results)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @jonahgao -- this PR looks basically perfect. My only concern is the security implications of executing prepared statements when the SQL is defined to be "read only"

It would also be neat to have some way to deallocate prepared statements, but that would be a natural follow on ticket (perhaps adding support for DEALLOCATE for example)

@@ -687,7 +688,31 @@ impl SessionContext {
LogicalPlan::Statement(Statement::SetVariable(stmt)) => {
self.set_variable(stmt).await
}

LogicalPlan::Prepare(Prepare {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that LogicalPlan::Prepare seems to be runnable even if we run sql_with_options and disable statements.

pub async fn sql_with_options(
&self,
sql: &str,
options: SQLOptions,
) -> Result<DataFrame> {
let plan = self.state().create_logical_plan(sql).await?;
options.verify_plan(&plan)?;
self.execute_logical_plan(plan).await
}

I realize that this PR does not change if Prepare is a statement or not, but it is the first that actually makes Prepare do something

Could you please add a test like this (but for PREPARE statements) and make sure the statement can't be executed if statements are disabled?

async fn unsupported_statement_returns_error() {
let ctx = SessionContext::new();
ctx.sql("CREATE TABLE test (x int)").await.unwrap();
let options = SQLOptions::new().with_allow_statements(false);
let sql = "set datafusion.execution.batch_size = 5";
let df = ctx.sql_with_options(sql, options).await;
assert_eq!(
df.unwrap_err().strip_backtrace(),
"Error during planning: Statement not supported: SetVariable"
);
let options = options.with_allow_statements(true);
ctx.sql_with_options(sql, options).await.unwrap();
}

Perhaps (as a follow on PR) we can make LogicalPlan::Prepare a statement https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Statement.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporarily added PREPARE to BadPlanVisitor and added a test for this in 4729766
I will make LogicalPlan::Prepare a statement in a follow-up PR.

.collect::<Result<_>>()?;
}

let params = ParamValues::List(params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Ok(())
}
Entry::Occupied(e) => {
exec_err!("Prepared statement '{}' already exists", e.key())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this errors if an existing prepared statement should we add some way to erase / clear prepared plans? Now there is no way to avoid accumulating them over time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to implement DEALLOCATE in next PR.

@@ -106,9 +105,9 @@ async fn test_prepare_statement() -> Result<()> {
let ctx = create_ctx_with_partition(&tmp_dir, partition_count).await?;

// sql to statement then to prepare logical plan with parameters
// c1 defined as UINT32, c2 defined as UInt64 but the params are Int32 and Float64
let dataframe =
ctx.sql("PREPARE my_plan(INT, DOUBLE) AS SELECT c1, c2 FROM test WHERE c1 > $2 AND c1 < $1").await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- PREPARE is a statement in my mind (and has no results)

@@ -30,56 +30,175 @@ select * from person;
# Error due to syntax and semantic violation

# Syntax error: no name specified after the keyword prepare
statement error
statement error DataFusion error: SQL error: ParserError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

2 Bar


# Test issue: https://github.com/apache/datafusion/issues/12294
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonahgao -- this looks great

@@ -1705,6 +1773,14 @@ impl<'n, 'a> TreeNodeVisitor<'n> for BadPlanVisitor<'a> {
LogicalPlan::Statement(stmt) if !self.options.allow_statements => {
plan_err!("Statement not supported: {}", stmt.name())
}
// TODO: Implement PREPARE as a LogicalPlan::Statement
LogicalPlan::Prepare(_) if !self.options.allow_statements => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jonahgao jonahgao merged commit 34d9d3a into apache:main Nov 7, 2024
24 checks passed
@jonahgao
Copy link
Member Author

jonahgao commented Nov 7, 2024

Thanks @alamb for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store Prepare Logical Plan
2 participants