Skip to content

Commit

Permalink
ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for S…
Browse files Browse the repository at this point in the history
…QL TIMESTAMP Type

# Rationale
Running the query `CREATE EXTERNAL TABLE .. (c TIMESTAMP)` today in DataFusion will result in a data type pf "Date64" which means that anything more specific than the date will be ignored.

This leads to strange behavior such as

```shell
echo "Jorge,2018-12-13T12:12:10.011" >> /tmp/foo.csv
echo "Andrew,2018-11-13T17:11:10.011" > /tmp/foo.csv

cargo run -p datafusion --bin datafusion-cli
    Finished dev [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/datafusion-cli`
> CREATE EXTERNAL TABLE t(a varchar, b TIMESTAMP)
STORED AS CSV
LOCATION '/tmp/foo.csv';

0 rows in set. Query took 0 seconds.
> select * from t;
+--------+------------+
| a      | b          |
+--------+------------+
| Andrew | 2018-11-13 |
| Jorge  | 2018-12-13 |
+--------+------------+
```

(note that the Time part is chopped off)

# Changes
This PR changes the default mapping from SQL type `TIMESTAMP`

Closes apache#9936 from alamb/ARROW-12278-timestamps-for-timestamps

Authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
  • Loading branch information
alamb authored and GeorgeAp committed Jun 7, 2021
1 parent 221724a commit 32b7b72
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 5 deletions.
51 changes: 47 additions & 4 deletions rust/datafusion/src/execution/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2837,6 +2837,52 @@ mod tests {
Ok(())
}

#[tokio::test]
async fn create_external_table_with_timestamps() {
let mut ctx = ExecutionContext::new();

let data = "Jorge,2018-12-13T12:12:10.011\n\
Andrew,2018-11-13T17:11:10.011";

let tmp_dir = TempDir::new().unwrap();
let file_path = tmp_dir.path().join("timestamps.csv");

// scope to ensure the file is closed and written
{
File::create(&file_path)
.expect("creating temp file")
.write_all(data.as_bytes())
.expect("writing data");
}

let sql = format!(
"CREATE EXTERNAL TABLE csv_with_timestamps (
name VARCHAR,
ts TIMESTAMP
)
STORED AS CSV
LOCATION '{}'
",
file_path.to_str().expect("path is utf8")
);

plan_and_collect(&mut ctx, &sql)
.await
.expect("Executing CREATE EXTERNAL TABLE");

let sql = "SELECT * from csv_with_timestamps";
let result = plan_and_collect(&mut ctx, &sql).await.unwrap();
let expected = vec![
"+--------+-------------------------+",
"| name | ts |",
"+--------+-------------------------+",
"| Andrew | 2018-11-13 17:11:10.011 |",
"| Jorge | 2018-12-13 12:12:10.011 |",
"+--------+-------------------------+",
];
assert_batches_sorted_eq!(expected, &result);
}

struct MyPhysicalPlanner {}

impl PhysicalPlanner for MyPhysicalPlanner {
Expand Down Expand Up @@ -2869,10 +2915,7 @@ mod tests {
ctx: &mut ExecutionContext,
sql: &str,
) -> Result<Vec<RecordBatch>> {
let logical_plan = ctx.create_logical_plan(sql)?;
let logical_plan = ctx.optimize(&logical_plan)?;
let physical_plan = ctx.create_physical_plan(&logical_plan)?;
collect(physical_plan).await
ctx.sql(sql)?.collect().await
}

/// Execute SQL and return results
Expand Down
2 changes: 1 addition & 1 deletion rust/datafusion/src/sql/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
SQLDataType::Boolean => Ok(DataType::Boolean),
SQLDataType::Date => Ok(DataType::Date32),
SQLDataType::Time => Ok(DataType::Time64(TimeUnit::Millisecond)),
SQLDataType::Timestamp => Ok(DataType::Date64),
SQLDataType::Timestamp => Ok(DataType::Timestamp(TimeUnit::Nanosecond, None)),
_ => Err(DataFusionError::NotImplemented(format!(
"The SQL data type {:?} is not implemented",
sql_type
Expand Down

0 comments on commit 32b7b72

Please sign in to comment.