Clarify in docs that Identifiers are made lower-case in SQL query #2374

dbr · 2022-04-29T09:27:40Z

Describe the bug
With (I think) the most up-to-date version of the arrow2 branch, the idenfitiers in query seem to be made lower-case

For example if I modify the memtable example by renaming the bank_acount field to Bank_Account then the query will fail,

$ gd
diff --git a/datafusion-examples/examples/memtable.rs b/datafusion-examples/examples/memtable.rs
index 4c635205..16bd247d 100644
--- a/datafusion-examples/examples/memtable.rs
+++ b/datafusion-examples/examples/memtable.rs
@@ -37,7 +37,7 @@ async fn main() -> Result<()> {
     // Register the in-memory table containing the data
     ctx.register_table("users", Arc::new(mem_table))?;
 
-    let dataframe = ctx.sql("SELECT * FROM users;").await?;
+    let dataframe = ctx.sql("SELECT Bank_Account FROM users;").await?;
 
     timeout(Duration::from_secs(10), async move {
         let result = dataframe.collect().await.unwrap();
@@ -72,6 +72,6 @@ fn create_record_batch() -> Result<RecordBatch> {
 fn get_schema() -> SchemaRef {
     SchemaRef::new(Schema::new(vec![
         Field::new("id", DataType::UInt8, false),
-        Field::new("bank_account", DataType::UInt64, true),
+        Field::new("Bank_Account", DataType::UInt64, true),
     ]))
 }
$ cargo run --example memtable
    Finished dev [unoptimized + debuginfo] target(s) in 0.35s
     Running `./target/debug/examples/memtable`
Error: Plan("Invalid identifier '#bank_account' for schema fields:[users.id, users.Bank_Account], metadata:{}")

Meaning, if the source table has any non-lowercase characters, the queries cannot work

Additional context
I'm using the arrow2 branch at this revision:

744b262

The text was updated successfully, but these errors were encountered:

dbr · 2022-04-29T09:43:41Z

Oh this is acting same in latest master branch! 7a9b865

$ git rev-parse HEAD
7a9b86526e2bd12deccb9ff30cead2ba6323409a
$ git diff
diff --git a/datafusion-examples/examples/memtable.rs b/datafusion-examples/examples/memtable.rs
index 11793a83..80cb65fc 100644
--- a/datafusion-examples/examples/memtable.rs
+++ b/datafusion-examples/examples/memtable.rs
@@ -36,7 +36,7 @@ async fn main() -> Result<()> {
     // Register the in-memory table containing the data
     ctx.register_table("users", Arc::new(mem_table))?;
 
-    let dataframe = ctx.sql("SELECT * FROM users;").await?;
+    let dataframe = ctx.sql("SELECT Bank_Account FROM users;").await?;
 
     timeout(Duration::from_secs(10), async move {
         let result = dataframe.collect().await.unwrap();
@@ -71,6 +71,6 @@ fn create_record_batch() -> Result<RecordBatch> {
 fn get_schema() -> SchemaRef {
     SchemaRef::new(Schema::new(vec![
         Field::new("id", DataType::UInt8, false),
-        Field::new("bank_account", DataType::UInt64, true),
+        Field::new("Bank_Account", DataType::UInt64, true),
     ]))
 }
$ cargo run --example memtable
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
     Running `./target/debug/examples/memtable`
Error: Plan("Invalid identifier '#bank_account' for schema fields:[users.id, users.Bank_Account], metadata:{}")

andygrove · 2022-04-29T21:15:03Z

@dbr Unquoted SQL identifiers are case-insensitive by design (to match ANSI and Postgres). SQL identifiers in double quotes are case-sensitive. If you put double quotes around the identifier in the SQL query then it works as expected.

diff --git a/datafusion-examples/examples/memtable.rs b/datafusion-examples/examples/memtable.rs
index 11793a837f..b04f5424a1 100644
--- a/datafusion-examples/examples/memtable.rs
+++ b/datafusion-examples/examples/memtable.rs
@@ -36,7 +36,7 @@ async fn main() -> Result<()> {
     // Register the in-memory table containing the data
     ctx.register_table("users", Arc::new(mem_table))?;
 
-    let dataframe = ctx.sql("SELECT * FROM users;").await?;
+    let dataframe = ctx.sql("SELECT \"Bank_Account\" FROM users;").await?;
 
     timeout(Duration::from_secs(10), async move {
         let result = dataframe.collect().await.unwrap();
@@ -71,6 +71,6 @@ fn create_record_batch() -> Result<RecordBatch> {
 fn get_schema() -> SchemaRef {
     SchemaRef::new(Schema::new(vec![
         Field::new("id", DataType::UInt8, false),
-        Field::new("bank_account", DataType::UInt64, true),
+        Field::new("Bank_Account", DataType::UInt64, true),
     ]))
 }

dbr · 2022-04-30T03:53:48Z

Ah interesting! Quoting the identifiers works, thanks

Just to be clear, is it still a bug that the original example fails? I would expect:

Given the field named bank_account the SELECT BaNk_AcCoUnT FROM users does work as expected (case is ignored)
Given the field named Bank_Account, I would expect SELECT BaNk_AcCoUnT FROM users to work (but currently this only looks for a field named bank_account so the query errors, and the error contains a lower-cased version of what is present in the query)

andygrove · 2022-05-02T04:45:54Z

@dbr My view is that this is not a bug. If a non-lowercase table or column name is created via the low-level Schema/DataFrame APIs then I would expect to have to use double-quotes in SQL to reference it. However, we are probably lacking in documentation around this so lets at least use this issue to drive that. Also would like to hear what others think, including @alamb and @yjshen

alamb · 2022-05-02T17:36:32Z

Given the field named Bank_Account, I would expect SELECT BaNk_AcCoUnT FROM users to work (but currently this only looks for a field named bank_account so the query errors, and the error contains a lower-cased version of what is present in the query)

I would not expect this to work (as the field is named Bank_Account)

Here is an example in case that is helpful: #1710 (comment)

So in other words, I agree with @andygrove 's conclusion that this is not a bug but rather "working as intended"

dbr · 2022-05-03T09:57:56Z

Ahh I see - linked comment is interesting, I wasn't aware Postgres had these rules around case-insensitivity!

I was thinking maybe the error message could be clearer, but even postgres appears to do the same thing (e.g error message containing the lower-cased field name, not the one the user supplied). So just noting this in the docs as mentioned seems like a good solution

dbr added the bug Something isn't working label Apr 29, 2022

dbr changed the title ~~Arrow2: Identifiers are made lower-case in SQL query~~ Identifiers are made lower-case in SQL query Apr 29, 2022

andygrove self-assigned this Apr 29, 2022

andygrove added documentation Improvements or additions to documentation and removed bug Something isn't working labels May 6, 2022

alamb changed the title ~~Identifiers are made lower-case in SQL query~~ Clarify in docs that Identifiers are made lower-case in SQL query May 13, 2022

nanicpc mentioned this issue Sep 7, 2022

Improve doc on lowercase treatment of columns on SQL #3385

Merged

alamb closed this as completed in #3385 Oct 12, 2022

alamb mentioned this issue Oct 12, 2022

Clarify initial example with respect to capitalization #3815

Merged

syepes mentioned this issue Feb 14, 2023

It is not possible to query fields with capitals names parseablehq/parseable#47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify in docs that Identifiers are made lower-case in SQL query #2374

Clarify in docs that Identifiers are made lower-case in SQL query #2374

dbr commented Apr 29, 2022

dbr commented Apr 29, 2022

andygrove commented Apr 29, 2022

dbr commented Apr 30, 2022

andygrove commented May 2, 2022

alamb commented May 2, 2022 •

edited

Loading

dbr commented May 3, 2022 •

edited

Loading

Clarify in docs that Identifiers are made lower-case in SQL query #2374

Clarify in docs that Identifiers are made lower-case in SQL query #2374

Comments

dbr commented Apr 29, 2022

dbr commented Apr 29, 2022

andygrove commented Apr 29, 2022

dbr commented Apr 30, 2022

andygrove commented May 2, 2022

alamb commented May 2, 2022 • edited Loading

dbr commented May 3, 2022 • edited Loading

alamb commented May 2, 2022 •

edited

Loading

dbr commented May 3, 2022 •

edited

Loading