-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify in docs that Identifiers are made lower-case in SQL query #2374
Comments
Oh this is acting same in latest $ git rev-parse HEAD
7a9b86526e2bd12deccb9ff30cead2ba6323409a
$ git diff
diff --git a/datafusion-examples/examples/memtable.rs b/datafusion-examples/examples/memtable.rs
index 11793a83..80cb65fc 100644
--- a/datafusion-examples/examples/memtable.rs
+++ b/datafusion-examples/examples/memtable.rs
@@ -36,7 +36,7 @@ async fn main() -> Result<()> {
// Register the in-memory table containing the data
ctx.register_table("users", Arc::new(mem_table))?;
- let dataframe = ctx.sql("SELECT * FROM users;").await?;
+ let dataframe = ctx.sql("SELECT Bank_Account FROM users;").await?;
timeout(Duration::from_secs(10), async move {
let result = dataframe.collect().await.unwrap();
@@ -71,6 +71,6 @@ fn create_record_batch() -> Result<RecordBatch> {
fn get_schema() -> SchemaRef {
SchemaRef::new(Schema::new(vec![
Field::new("id", DataType::UInt8, false),
- Field::new("bank_account", DataType::UInt64, true),
+ Field::new("Bank_Account", DataType::UInt64, true),
]))
}
$ cargo run --example memtable
Finished dev [unoptimized + debuginfo] target(s) in 0.36s
Running `./target/debug/examples/memtable`
Error: Plan("Invalid identifier '#bank_account' for schema fields:[users.id, users.Bank_Account], metadata:{}") |
@dbr Unquoted SQL identifiers are case-insensitive by design (to match ANSI and Postgres). SQL identifiers in double quotes are case-sensitive. If you put double quotes around the identifier in the SQL query then it works as expected.
|
Ah interesting! Quoting the identifiers works, thanks Just to be clear, is it still a bug that the original example fails? I would expect:
|
@dbr My view is that this is not a bug. If a non-lowercase table or column name is created via the low-level Schema/DataFrame APIs then I would expect to have to use double-quotes in SQL to reference it. However, we are probably lacking in documentation around this so lets at least use this issue to drive that. Also would like to hear what others think, including @alamb and @yjshen |
I would not expect this to work (as the field is named Here is an example in case that is helpful: #1710 (comment) So in other words, I agree with @andygrove 's conclusion that this is not a bug but rather "working as intended" |
Ahh I see - linked comment is interesting, I wasn't aware Postgres had these rules around case-insensitivity! I was thinking maybe the error message could be clearer, but even postgres appears to do the same thing (e.g error message containing the lower-cased field name, not the one the user supplied). So just noting this in the docs as mentioned seems like a good solution |
Describe the bug
With (I think) the most up-to-date version of the arrow2 branch, the idenfitiers in query seem to be made lower-case
For example if I modify the
memtable
example by renaming thebank_acount
field toBank_Account
then the query will fail,Meaning, if the source table has any non-lowercase characters, the queries cannot work
Additional context
I'm using the arrow2 branch at this revision:
744b262
The text was updated successfully, but these errors were encountered: