Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Placeholders in prepared statements #678

Merged
merged 10 commits into from
Feb 23, 2023
Merged

feat: Placeholders in prepared statements #678

merged 10 commits into from
Feb 23, 2023

Conversation

scsmithr
Copy link
Member

@scsmithr scsmithr commented Feb 22, 2023

Provides an initial placeholder implementation. Much of this is relying on datafusion's ability to extract to types from the logical plan. In some cases, it's not able to infer types, leading to us emitting an error to the client. I believe this will get better as datafusion continues to improve its support for placeholders.

See the added protocol tests for some examples of things we do support and things we error on.

Closes #122. Note that we don't support "everything", but "everything" is quite large. This PR is a base to work on, and we'll want to open up more specific issues moving forward.

@scsmithr scsmithr marked this pull request as ready for review February 22, 2023 20:41
@@ -299,7 +300,7 @@ impl SessionContext {
&mut self,
name: String,
stmt: Option<StatementWithExtensions>,
params: Vec<i32>,
_params: Vec<i32>, // TODO: We can use these for providing types for parameters.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off on this for now. I would like to follow up in another pr later this week or next.

Comment on lines 32 to 34
fn read_bool(buf: &[u8]) -> Result<bool> {
TextReader::parse::<_, SqlBool>(buf).map(|b| b.0)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn read_bool(buf: &[u8]) -> Result<bool> {
TextReader::parse::<_, SqlBool>(buf).map(|b| b.0)
}
fn read_bool(buf: &[u8]) -> Result<bool> {
Self::parse::<_, SqlBool>(buf).map(|b| b.0)
}

Just a personal preference! No need to change for the PR :)

}
}

fn decode_not_null_value<R: Reader>(buf: &[u8], arrow_type: &ArrowType) -> Result<ScalarValue> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to do it like from: &PgType, to: &ArrowType? Leaves room for other kinds of transformations. All the read and write methods are corresponding to the PG datatypes.

For eg: we will need to cast PgType::JSON to ArrowType::Utf8 but the read method will differ when reading JSON (it would be read_json).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might?

Postgres docs have this to say:

first sends a Parse message, which contains a textual query string, optionally some information about data types of parameter placeholders, [...] Parameter data types can be specified by OID; if not given, the parser attempts to infer the data types in the same way as it would do for untyped literal string constants.

Currently I'm ignoring the oids being provided during parse and inferring everything. I think we'll want this function to accept something like PostgresOrArrowType or just convert the arrow type that we get from inferring into the associated postgres type. I'll mess around with using the oids provided during parse and see what makes sense here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have the PG type even when inferred which then should be translated to arrow type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making that change right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest change swaps out the arrow types with pg types.

@scsmithr scsmithr requested a review from vrongmeal February 23, 2023 17:35
Copy link
Contributor

@vrongmeal vrongmeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can we add the todo comment? I'll look into making the changes when working on filters!

Comment on lines +97 to +109
fn decode_not_null_value<R: Reader>(buf: &[u8], pg_type: &PgType) -> Result<ScalarValue> {
Ok(match pg_type {
&PgType::BOOL => R::read_bool(buf)?.into(),
&PgType::INT2 => R::read_int2(buf)?.into(),
&PgType::INT4 => R::read_int4(buf)?.into(),
&PgType::INT8 => R::read_int8(buf)?.into(),
&PgType::FLOAT4 => R::read_float4(buf)?.into(),
&PgType::FLOAT8 => R::read_float8(buf)?.into(),
&PgType::TEXT => ScalarValue::Utf8(Some(R::read_text(buf)?)),
other => return Err(PgReprError::UnsupportedPgTypeForDecode(other.clone())),
})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good. Can we add a little TODO that we infer types as SQL or PG types (instead of arrow rolling with) because this is where the issue lies, arrow doesn't really work well with all the data types when inferring. I'll see what we can do, if we can work around arrow or actually implement something like inferring PG types (and not convert them from Arrow). Also might want to carry both the types along -- Arrow and PG (will be helpful in catching bugs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add that todo.

What benefits do you see with providing the arrow type here as well? Since converting from an arrow type to a pg type is unambiguous, I'm not sure what we'd be able to use the arrow type for in this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I am just recalling it that the ScalarValue enum won't be sufficient. It was required if we could've used another scalar to represent a value but yeah, if I'm going to change the scalar enum, it will be redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think trying to switch out the scalar enum will be very tricky given how intertwined datafusion planning and execution are and how much we rely on those. Even inferring pg types directly from a query with parameters will be difficult since the produced logical plan will be constrained to arrow types.

I'm going to merge this as-is for now, but I'm down to add in the arrow types once they're needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do need to look at the scalar values. What I’m hoping is we can switch the scalar just for PG conversions (to and fro). The problem with Arrow scalars is that there exists an array, the value is valid, but Arrow just doesn’t have a scalar to represent it!

@scsmithr scsmithr merged commit af6d189 into main Feb 23, 2023
@scsmithr scsmithr deleted the sean/parameters branch February 23, 2023 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support placeholders when planning queries
3 participants