8.0.0 (2022-05-12)
Breaking changes:
- Add SQL planner support for
ROLLUP
andCUBE
grouping set expressions #2446 (andygrove) - Make
ExecutionPlan::execute
Sync #2434 (tustvold) - Introduce new
DataFusionError::SchemaError
type #2371 (andygrove) - Add
Expr::InSubquery
andExpr::ScalarSubquery
#2342 (andygrove) - Add
Expr::Exists
to represent EXISTS subquery expression #2339 (andygrove) - Move
LogicalPlan
enum todatafusion-expr
crate #2294 (andygrove) - Remove dependency from
LogicalPlan::TableScan
toExecutionPlan
#2284 (andygrove) - Move logical expression type-coercion code from
physical-expr
crate toexpr
crate #2257 (andygrove) - feat: 2061 create external table ddl table partition cols #2099 [sql] (jychen7)
- Reorganize the project folders #2081 (yahoNanJing)
- Support more ScalarFunction in Ballista #2008 (Ted-Jiang)
- Merge dataframe and dataframe imp #1998 (vchag)
- Rename
ExecutionContext
toSessionContext
,ExecutionContextState
toSessionState
, addTaskContext
to support multi-tenancy configurations - Part 1 #1987 (mingmwang) - Add Coalesce function #1969 (msathis)
- Add Create Schema functionality in SQL #1959 [sql] (matthewmturner)
- omit some clone when converting sql to logical plan #1945 [sql] (doki23)
- [split/16] move physical plan expressions folder to datafusion-physical-expr crate #1889 (Jimexist)
- remove sync constraint of SendableRecordBatchStream #1884 (doki23)
- [split/15] move built in window expr and partition evaluator #1865 (Jimexist)
Implemented enhancements:
- Include
Expr
todatafusion::prelude
#2347 - Implement
Serialization
API for DataFusion #2340 - Implement
power
function #1493 - allow
lit
python function to supportboolean
and other types #1136 - Automate dependency updates #37
- Add
CREATE VIEW
#2279 (matthewmturner) - [Ballista] Support Union in ballista. #2098 (Ted-Jiang)
- Change the DataFusion explain plans to make it clearer in the predicate/filter #2063 (Ted-Jiang)
- Add
write_json
,read_json
,register_json
, andJsonFormat
toCREATE EXTERNAL TABLE
functionality #2023 (matthewmturner) - Qualified wildcard #2012 [sql] (doki23)
- support bitwise or/'|' operation #1876 [sql] (liukun4515)
- Introduce JIT code generation #1849 (yjshen)
Fixed bugs:
- CASE expr with NULL literals panics
'WHEN expression did not return a BooleanArray'
#1189 - Function calls with NULL literals do not work #1188
- Add SQL planner support for calling
round
function with two arguments #2503 (andygrove) - nested query fix #2402 (comphead)
- fix issue#2058 file_format/json.rs attempt to subtract with overflow #2066 (silence-coding)
- fix bug the optimizer rule filter push down #2039 (jackwener)
- fix: replace
ExecutionContex
andExecutionConfig
withSessionContext
andSessionConfig
#2030 (xudong963) - Fixed parquet path partitioning when only selecting partitioned columns #2000 (pjmore)
- Fix ambiguous reference error in filter plan #1925 (jonmmease)
- platform aware partition parsing #1867 (korowa)
- Fix incorrect aggregation in case that GROUP BY contains duplicate column names #1855 (alex-natzka)
Documentation updates:
- MINOR: Make crate READMEs consistent #2437 (andygrove)
- minor: Improve documentation for DFSchema join and merge functions #2367 (andygrove)
- Change the code location and add annotation #2037 [sql] (jackwener)
- Fix typos (Datafusion -> DataFusion) #1993 (andygrove)
- Add examples to use MemTable and TableProvider (#1864) #1946 (PierreZ)
- Add doc for building
datafusion-cli
when connect the ballista #1866 (liukun4515) - Add benchmarks section to DEVELOPERS.md #1838 (tustvold)
Performance improvements:
- Avoid an Arc::clone per row in benchmark #1975 (jhorstmann)
- Update datafusion-cli allocator #1878 (matthewmturner)
Closed issues:
- Make expected result string in unit tests more readable #2412
- remove duplicated
fn aggregate()
in aggregate expression tests #2399 - split
distinct_expression.rs
intocount_distinct.rs
andarray_agg_distinct.rs
#2385 - move sql tests in
context.rs
to corresponding test files indatafustion/core/tests/sql
#2328 - Date32/Date64 as join keys for merge join #2314
- Error precision and scale for decimal coercion in logic comparison #2232
- Support Multiple row layout #2188
- TPC-H Query 18 #169
- TPC-H Query 16 #167
- Implement Sort-Merge Join #141
- Split logical expressions out into separate source files #114
Merged pull requests:
- Minor: remove code that is now included in arrow-rs #2511 (alamb)
- MINOR: Enable multi-statement benchmark queries #2507 (andygrove)
- MINOR: Add ignored tests for all remaining benchmark queries #2506 (andygrove)
- Update to
sqlparser
0.17.0
#2500 (alamb) - Add metrics for ParquetExec #2499 (Ted-Jiang)
- Limit cpu cores used when generating changelog #2494 (andygrove)
- Optimize MergeJoin by storing joined indices instead of creating small record batches for each match #2492 (richox)
- Add SQL planner support for
grouping()
aggregate expressions #2486 (andygrove) - MINOR: Parameterize changelog script #2484 (jychen7)
- Numeric, String, Boolean comparisons with literal
NULL
#2481 (WinkerDu) - Adds unit test cases of mathematical expressions working with
null
literal #2478 (WinkerDu) - Minor: Move test code from
context.rs
intosql_integration
#2473 (alamb) - Minor: Use ExprVisitor to find columns referenced by expr #2471 (alamb)
- minor: remove expr dependency from the row crate, update crate-deps.dot/svg #2470 (yjshen)
- Fix
read_from_registered_table_with_glob_path
fails if path contains // #2465 #2468 (timvw) - Add support for list_dir() on local fs #2467 (wjones127)
- MINOR: Partial fix for SQL aggregate queries with aliases #2464 (andygrove)
- minor: move struct definition out of
aggregate/mod.rs
, etc #2458 (WinkerDu) - Fix bugs in SQL planner with GROUP BY scalar function and alias #2457 (andygrove)
- feat: Support CompoundIdentifier as GetIndexedField access #2454 (ovr)
- Table provider error propagation #2438 (jdye64)
- MINOR: Improve error messages for GROUP BY / HAVING queries #2435 (andygrove)
- minor: remove redundant code #2432 (jackwener)
- minor: update versions and paths in changelog scripts #2429 (andygrove)
- Fix Ballista executing during plan #2428 (tustvold)
- minor: format table result vec & remove some unnecessary semicolons #2425 (WinkerDu)
- Basic support for
IN
andNOT IN
Subqueries by rewriting them toSEMI
/ANTI
Join #2421 (korowa) - Allow subqueries without aliases #2418 (andygrove)
- Fix bug in subquery join filters referencing outer query #2416 (andygrove)
- MINOR: remove duplicated function
format_state_name()
#2414 (WinkerDu) - Make expected result string in unit tests more readable #2413 (WinkerDu)
sum(distinct)
support #2405 (WinkerDu)- Update ordered-float requirement from 2.10 to 3.0 #2403 (dependabot[bot])
- remove duplicated
fn aggregate()
in aggregate expression tests #2400 (WinkerDu) - Support type-coercion from Decimal to Float64 #2396 (comphead)
- minor: SchemaError code cleanup and improvements #2391 (andygrove)
- Support struct_expr generate struct in sql #2389 (Ted-Jiang)
- Re-organize and rename aggregates physical plan #2388 (yjshen)
- refactor
distinct_expressions.rs
and split intocount_distinct.rs
andarray_agg_distinct.rs
#2386 (WinkerDu) - Allow CTEs to be referenced from subquery expressions #2384 (andygrove)
- Upgrade to arrow 13 #2382 (alamb)
- Grouped Aggregate in row format #2375 (yjshen)
- Fix bugs with CTE aliasing and normalize all identifiers in the SQL planner #2373 (andygrove)
- Stop optimizing queries twice #2369 (andygrove)
- feat: Support casting to arrays to primitive type #2366 (ovr)
- Add proper support for
null
literal by introducingScalarValue::Null
#2364 (WinkerDu) - minor: fix duplicate column bug in subquery support #2362 (andygrove)
- Normalize subquery aliases #2359 (andygrove)
- Implement physical planner support for DATE +/- INTERVAL #2357 (andygrove)
- Add SQL query planner support for Scalar Subqueries #2354 (andygrove)
- Add SQL query planner support for IN subqueries #2352 (andygrove)
- Add
Expr
to prelude #2348 (alamb) - Add SQL planner support for EXISTS subqueries #2344 (andygrove)
- Add public Serialization/Deserialization API for
Expr
to/from bytes #2341 (alamb) - Support for date32 and date64 in sort merge join #2336 (hntd187)
- [physical-expr] move aggregate exprs and window exprs to their own modules #2335 (yjshen)
- fix: union schema #2334 (gandronchik)
- Improve sql integration test organization #2333 (alamb)
- Support scalar values for func Array #2332 (Ted-Jiang)
- move sql tests from
context.rs
to corresponding test files intests/sql
#2329 (WinkerDu) - deprecate
index_of
and makeindex_of_column_by_name
public #2320 (jdye64) - Fix HashJoin evaluating during plan #2317 (tustvold)
- minor: remove two source files that only had re-exports #2313 (andygrove)
- Don't sort batches during plan #2312 (tustvold)
- Move case/when expressions to datafusion-expr crate #2311 (andygrove)
- Fix CrossJoinExec evaluating during plan #2310 (tustvold)
- Make SortPreservingMerge Usable Outside Tokio (#2201) #2305 (tustvold)
- chore: update cranelift to 0.83.0 #2304 (yjshen)
- Always increment timer on record #2298 (tustvold)
- Remove unnecessary env var for parquet_sql example #2297 (sergey-melnychuk)
- Simplify sort streams #2296 (tustvold)
- MINOR: beautify code with neat idents #2295 (WinkerDu)
- Move FileType enum from sql module to logical_plan module #2290 (andygrove)
- Remove Parquet Empty Projection Workaround #2289 (tustvold)
- Add BatchPartitioner (#2285) #2287 (tustvold)
- Make row its crate to make it accessible from physical-expr #2283 (yjshen)
- Enable filter pushdown when using In_list on parquet #2282 (Ted-Jiang)
- Update uuid requirement from 0.8 to 1.0 #2280 (dependabot[bot])
- Add bytes scanned metric to ParquetExec #2273 (thinkharderdev)
- Fix outer join output with all-null indices on empty batch #2272 (yjshen)
- Re-export DataFusion crates #2264 (andygrove)
- rewrite approx_median to approx_percentile_cont while planning phase #2262 (korowa)
- Introduce RowLayout to represent rows for different purposes #2261 (yjshen)
- fix string coercion missing in Eq/NotEq operator #2258 (WinkerDu)
- Update to Arrow 12.0.0, update tonic and prost #2253 (alamb)
- minor: move field_util from
physical-expr
crate toexpr
crate #2250 (andygrove) - Move identifer case tests to
sql_integ
, add negative cases, Debug forDataFrame
#2243 (alamb) - Implement sort-merge join #2242 (richox)
- fix: find the right wider decimal datatype for comparison operation #2241 (liukun4515)
- Fix join without constraints #2240 (Dandandan)
- Add type coercion rule for date + interval #2235 (andygrove)
- support array with scalar arithmetic operation for decimal data type #2233 (liukun4515)
- chore: add
debug!
log in some execution operators #2231 (NGA-TRAN) - Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199) #2226 (tustvold)
- minor: add editor config file #2224 (jackwener)
- minor: Refactor to avoid repeated code in replace_qualifier #2222 (andygrove)
- update cli readme #2220 (liukun4515)
- Use
filter
(filter_record_batch) instead oftake
to avoid using indices #2218 (Dandandan) - Add single line description of ExecutionPlan (#2216) #2217 (tustvold)
- Remove tokio::spawn from HashAggregateExec (#2201) #2215 (tustvold)
- Remove tokio::spawn from WindowAggExec (#2201) #2203 (tustvold)
- Make ParquetExec usable outside of a tokio runtime (#2201) #2202 (tustvold)
- add sql level test for decimal data type #2200 (liukun4515)
case when
supportsNULL
constant #2197 (WinkerDu)- feat: Support simple Arrays with Literals #2194 (ovr)
- [Ballista] Enable ApproxPercentileWithWeight in Ballista and fill UT #2192 (Ted-Jiang)
- refactor: simplify
prepare_select_exprs
#2190 (jackwener) - Multiple row-layout support, part-1: Restructure code for clearness #2189 (yjshen)
- make nightly clippy happy #2186 (xudong963)
- [Ballista]Make PhysicalAggregateExprNode has repeated PhysicalExprNode #2184 (Ted-Jiang)
- MINOR: handle
NULL
in advance to avoid value copy instring_concat
#2183 (WinkerDu) - fix: Sort with a lot of repetition values #2182 (yjshen)
- cli: update lockfile #2178 (happysalada)
- Add LogicalPlan::SubqueryAlias #2172 (andygrove)
- minor: Avoid per cell evaluation in Coalesce, use zip in CaseWhen #2171 (yjshen)
- Handle merged schemas in parquet pruning #2170 (thinkharderdev)
- Implement fast path of with_new_children() in ExecutionPlan #2168 (mingmwang)
- enable explain for ballista #2163 (doki23)
- Add delimiter for create external table #2162 (matthewmturner)
- [MINOR] enable
EXTRACT week
and add test (after sqlparser update to 0.16) #2157 (Ted-Jiang) - Optimize the evaluation of
IN
for large lists using InSet #2156 (Ted-Jiang) - Update sqlparser requirement from 0.15 to 0.16 #2152 (dependabot[bot])
- fix
not(null)
with constantnull
#2144 (WinkerDu) - Add IF NOT EXISTS to
CREATE TABLE
andCREATE EXTERNAL TABLE
#2143 (matthewmturner) - implement 'StringConcat' operator to support sql like "select 'aa' || 'b' " #2142 (WinkerDu)
- #2109 By default, use only 1000 rows to infer the schema #2139 (jychen7)
- [CLI] Add show tables in ballista for datafusion-cli #2137 (gaojun2048)
- fix: incorrect memory usage track for sort #2135 (yjshen)
- Update quarterly roadmap for Q2 #2133 (matthewmturner)
- Reduce SortExec memory usage by void constructing single huge batch #2132 (yjshen)
- MINOR: fix concat_ws corner bug #2128 (WinkerDu)
- Minor add clarifying comment in parquet #2127 (alamb)
- Minor: make disk_manager public #2126 (yjshen)
- JIT-compille DataFusion expression with column name #2124 (Dandandan)
- minor: replace array_equals in case evaluation with eq_dyn from arrow-rs #2121 (alamb)
- Serialize timezone in timestamp scalar values #2120 (thinkharderdev)
- minor: fix some clippy warnings from nightly rust #2119 (alamb)
- Fix case evaluation with NULLs #2118 (alamb)
- issue#1967 ignore channel close #2113 (silence-coding)
- cli: add cargo.lock #2112 (happysalada)
- doc: update release schedule #2110 (jychen7)
- fix df union all bug #2108 [sql] (WinkerDu)
- Reduce repetition in Decimal binary kernels, upgrade to arrow 11.1 #2107 (alamb)
- update zlib version to 1.2.12 #2106 (waitingkuo)
- Create jit-expression from datafusion expression #2103 (Dandandan)
- Add CREATE DATABASE command to SQL #2094 [sql] (matthewmturner)
- Refactor SessionContext, BallistaContext to support multi-tenancy configurations - Part 3 #2091 (mingmwang)
- minor: remove duplicate test #2089 (jackwener)
- minor: remove repeated test #2085 (jackwener)
- Fix lost filters and projections in ParquetExec, CSVExec etc #2077 (Ted-Jiang)
- Remove dependency of common for the storage crate #2076 (yahoNanJing)
- [MINOR] fix doc in `EXTRACT(field FROM source) #2074 (Ted-Jiang)
- [Bug][Datafusion] fix TaskContext session_config bug #2070 (gaojun2048)
- Short-circuit evaluation for
CaseWhen
#2068 (yjshen) - split datafusion-object-store module #2065 (yahoNanJing)
- Allow
CatalogProvider::register_catalog
to return an error #2052 (alamb) - Add test in register_catalog and change to use named symbolic constants #2050 (alamb)
- Update to arrow/parquet 11.0 #2048 (alamb)
- minor: format comments (
//
to//
) #2047 (jackwener) - use cargo-tomlfmt to check Cargo.toml formatting in CI #2033 (WinkerDu)
- feat: #2004 approx percentile with weight #2031 (jychen7)
- Refactor SessionContext, SessionState and SessionConfig to support multi-tenancy configurations - Part 2 #2029 (mingmwang)
- Simplify prerequisites for running examples #2028 (doki23)
- Replace usage of
println!
with logger macros #2020 (silence-coding) - Automatically test examples in user guide #2018 (vchag)
- return VecDeque for DFParser::parse_sql #2017 [sql] (doki23)
- Eliminate the scalar value filter #2002 (jackwener)
- Fixing a typo in documentation #1997 (psvri)
- Correct documentation of ExprVisitor #1996 (alamb)
- Make it possible to only scan part of a parquet file in a partition #1990 (yjshen)
- Update Dockerfile to fix integration tests #1982 (andygrove)
- Remove some more unecessary cloning in sql_expr_to_logical_expr #1981 [sql] (alamb)
- Add ticket reference to clippy allow #1978 [sql] (alamb)
- Implement EXTRACT expression with week, month, day, hour #1974 (Ted-Jiang)
- Address typo in ExprVisitable trait documentation #1970 (jdye64)
- Update sqlparser requirement from 0.14 to 0.15 #1966 (dependabot[bot])
- PruningPredicate should take owned Expr #1960 (thinkharderdev)
- Update to arrow 10.0.0, pyo3 0.16 #1957 (alamb)
- update jit-related dependencies #1953 (xudong963)
- minor code refinement:
if_exists
name change, wildcard field for logical plan, etc. #1951 [sql] (xudong963) - Allow different types of query variables (
@@var
) rather than just string #1943 [sql] (maxburke) - Pruning serialization #1941 (thinkharderdev)
- Add write_parquet to
DataFrame
#1940 (matthewmturner) - Fix select from EmptyExec always return 0 row after optimizer passes #1938 (Ted-Jiang)
- Add debug log when waiting for spilling on other consumers #1933 (viirya)
- Add db benchmark script #1928 (matthewmturner)
- Add write_csv to DataFrame #1922 (matthewmturner)
- [MINOR] Update copyright year in Docs #1918 (alamb)
- add metadata to DFSchema, close #1806. #1914 [sql] (jiacai2050)
- Clippy fix on nightly #1907 (yjshen)
- Updated Rust version to 1.59 in all the files #1903 (NaincyKumariKnoldus)
- support extract second and minute in expr. #1901 (Ted-Jiang)
- Update crate descriptions #1899 (alamb)
- Remove uneeded Mutex in Ballista Client #1898 (alamb)
- [split/17] move the rest of physical expr to datafusion-physical-expr crate #1892 (Jimexist)
- Avoid unnecessary branching in row read/write if schema is null-free #1891 (yjshen)
- Make parquet support optional for datafusion-common crate #1886 (jonmmease)
- Fix clippy lints #1885 (HaoYang670)
- Add support for
~/.datafusionrc
and cli option for overriding it to datafusion-cli #1875 (matthewmturner) - [Minor] Clean up DecimalArray API Usage #1869 [sql] (alamb)
- Changes after went through "Datafusion as a library section" #1868 (nonontb)
- Enhance MemorySchemaProvider to support
register_listing_table
#1863 (matthewmturner) - Increase default partition column type from Dict(UInt8) to Dict(UInt16) #1860 (Igosuki)
- Update to arrow 9.1.0 #1851 (alamb)
- move some tests out of context and into sql #1846 (alamb)
- [split/14] create
datafusion-physical-expr
module #1843 (Jimexist) - Return
Error
when parquet reader fails rather than no data withprintln!
#1837 (alamb) - determine build side in hash join by
total_byte_size
instead ofnum_rows
#1831 (xudong963) - Make ballista support an optional feature to datafusion-cli #1816 (alamb)
- Update documentation example for change in API #1812 (alamb)
- rename references of expr in physical plan module after datafusion-expr split #1798 (Jimexist)
- DataFusion + Conbench Integration #1791 (dianaclarke)
- The returned path value of get_by_uri should be self-described with entire path #1779 (yahoNanJing)
- Use
eq_dyn
,neq_dyn
,lt_dyn
,lt_eq_dyn
,gt_dyn
,gt_eq_dyn
kernels from arrow #1475 (alamb)