-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type #9936
Conversation
@@ -2863,10 +2913,7 @@ mod tests { | |||
ctx: &mut ExecutionContext, | |||
sql: &str, | |||
) -> Result<Vec<RecordBatch>> { | |||
let logical_plan = ctx.create_logical_plan(sql)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed so that CREATE EXTERNAL TABLE is handled correctly
@@ -298,7 +298,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
SQLDataType::Boolean => Ok(DataType::Boolean), | |||
SQLDataType::Date => Ok(DataType::Date32), | |||
SQLDataType::Time => Ok(DataType::Time64(TimeUnit::Millisecond)), | |||
SQLDataType::Timestamp => Ok(DataType::Date64), | |||
SQLDataType::Timestamp => Ok(DataType::Timestamp(TimeUnit::Nanosecond, None)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual code change in this PR
fae1e02
to
fd6e344
Compare
FYI @Dandandan / @seddonm1 / @ovr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice catch.
Random though: I think that we would benefit from not writing a test on Context
to test the SQL planner. It could be easier to identify which tests are testing what by writing unit tests more specialized to the component that they affect, as they usually declare the API behavior and are easier to find.
file_path.to_str().expect("path is utf8") | ||
); | ||
|
||
println!("{}", sql); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a wild println
has appeared :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no! I will remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
fd6e344
to
b6fbc1e
Compare
@jorgecarleitao I think the core of the problem is that the tests in What would you think about refactoring the non |
…QL TIMESTAMP Type # Rationale Running the query `CREATE EXTERNAL TABLE .. (c TIMESTAMP)` today in DataFusion will result in a data type pf "Date64" which means that anything more specific than the date will be ignored. This leads to strange behavior such as ```shell echo "Jorge,2018-12-13T12:12:10.011" >> /tmp/foo.csv echo "Andrew,2018-11-13T17:11:10.011" > /tmp/foo.csv cargo run -p datafusion --bin datafusion-cli Finished dev [unoptimized + debuginfo] target(s) in 0.23s Running `target/debug/datafusion-cli` > CREATE EXTERNAL TABLE t(a varchar, b TIMESTAMP) STORED AS CSV LOCATION '/tmp/foo.csv'; 0 rows in set. Query took 0 seconds. > select * from t; +--------+------------+ | a | b | +--------+------------+ | Andrew | 2018-11-13 | | Jorge | 2018-12-13 | +--------+------------+ ``` (note that the Time part is chopped off) # Changes This PR changes the default mapping from SQL type `TIMESTAMP` Closes apache#9936 from alamb/ARROW-12278-timestamps-for-timestamps Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
…QL TIMESTAMP Type # Rationale Running the query `CREATE EXTERNAL TABLE .. (c TIMESTAMP)` today in DataFusion will result in a data type pf "Date64" which means that anything more specific than the date will be ignored. This leads to strange behavior such as ```shell echo "Jorge,2018-12-13T12:12:10.011" >> /tmp/foo.csv echo "Andrew,2018-11-13T17:11:10.011" > /tmp/foo.csv cargo run -p datafusion --bin datafusion-cli Finished dev [unoptimized + debuginfo] target(s) in 0.23s Running `target/debug/datafusion-cli` > CREATE EXTERNAL TABLE t(a varchar, b TIMESTAMP) STORED AS CSV LOCATION '/tmp/foo.csv'; 0 rows in set. Query took 0 seconds. > select * from t; +--------+------------+ | a | b | +--------+------------+ | Andrew | 2018-11-13 | | Jorge | 2018-12-13 | +--------+------------+ ``` (note that the Time part is chopped off) # Changes This PR changes the default mapping from SQL type `TIMESTAMP` Closes apache#9936 from alamb/ARROW-12278-timestamps-for-timestamps Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
…QL TIMESTAMP Type # Rationale Running the query `CREATE EXTERNAL TABLE .. (c TIMESTAMP)` today in DataFusion will result in a data type pf "Date64" which means that anything more specific than the date will be ignored. This leads to strange behavior such as ```shell echo "Jorge,2018-12-13T12:12:10.011" >> /tmp/foo.csv echo "Andrew,2018-11-13T17:11:10.011" > /tmp/foo.csv cargo run -p datafusion --bin datafusion-cli Finished dev [unoptimized + debuginfo] target(s) in 0.23s Running `target/debug/datafusion-cli` > CREATE EXTERNAL TABLE t(a varchar, b TIMESTAMP) STORED AS CSV LOCATION '/tmp/foo.csv'; 0 rows in set. Query took 0 seconds. > select * from t; +--------+------------+ | a | b | +--------+------------+ | Andrew | 2018-11-13 | | Jorge | 2018-12-13 | +--------+------------+ ``` (note that the Time part is chopped off) # Changes This PR changes the default mapping from SQL type `TIMESTAMP` Closes apache#9936 from alamb/ARROW-12278-timestamps-for-timestamps Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
Rationale
Running the query
CREATE EXTERNAL TABLE .. (c TIMESTAMP)
today in DataFusion will result in a data type pf "Date64" which means that anything more specific than the date will be ignored.This leads to strange behavior such as
(note that the Time part is chopped off)
Changes
This PR changes the default mapping from SQL type
TIMESTAMP