Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matt/feat/recursive ctes/config flag #3

Closed
wants to merge 572 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
572 commits
Select commit Hold shift + click to select a range
2a69244
move array function unit_tests to sqllogictest (#8332)
Veeupup Nov 28, 2023
e21b031
NTH_VALUE reverse support (#8327)
mustafasrepo Nov 29, 2023
19bdcdc
Refactor optimize projections rule, combines (eliminate, merge, pushd…
mustafasrepo Nov 29, 2023
4c914ea
Move merge projections tests to under optimize projections (#8352)
mustafasrepo Nov 29, 2023
aeb012e
Add `quote` and `escape` attributes to create csv external table (#8351)
Asura7969 Nov 29, 2023
d22403a
Minor: Add DataFrame test (#8341)
alamb Nov 29, 2023
e93f8e1
clean up the code based on Clippy (#8359)
Weijun-H Nov 29, 2023
bbec787
Minor: Make it easier to work with Expr::ScalarFunction (#8350)
alamb Nov 29, 2023
11f164c
Move some datafusion-optimizer::utils down to datafusion-expr::utils …
Jesse-Bakker Nov 29, 2023
167b5b7
Minor: Make BuiltInScalarFunction::alias a method (#8349)
alamb Nov 29, 2023
06bbe12
Extract parquet statistics to its own module, add tests (#8294)
alamb Nov 29, 2023
c079a92
feat:implement sql style 'find_in_set' string function (#8328)
Syleechan Nov 30, 2023
a588123
largeutf to temporal (#8357)
jayzhan211 Nov 30, 2023
a49740f
Refactor aggregate function handling (#8358)
Weijun-H Nov 30, 2023
d45cf00
Implement Aliases for ScalarUDF (#8360)
Veeupup Nov 30, 2023
513fd05
Minor: Remove uncessary name field in ScalarFunctionDefintion (#8365)
alamb Nov 30, 2023
e52d150
feat: support `LargeList` in `array_empty` (#8321)
Weijun-H Nov 30, 2023
5c02664
Double type argument for to_timestamp function (#8159)
spaydar Nov 30, 2023
e19c669
Support User Defined Table Function (#8306)
Veeupup Nov 30, 2023
c19260d
Document timestamp input limits (#8369)
comphead Dec 1, 2023
eb5aa22
fix: make `ntile` work in some corner cases (#8371)
haohuaijin Dec 1, 2023
8882f1b
Refactor array_union function to use a generic (#8381)
Weijun-H Dec 1, 2023
a6e6d3f
Refactor function argument handling in (#8387)
Weijun-H Dec 1, 2023
eb8aff7
Materialize dictionaries in group keys (#7647) (#8291)
qrilka Dec 1, 2023
f5d10e5
Rewrite `array_ndims` to fix List(Null) handling (#8320)
jayzhan211 Dec 1, 2023
3b29837
Docs: Improve the documentation on `ScalarValue` (#8378)
alamb Dec 2, 2023
340ecfd
Avoid concat for `array_replace` (#8337)
jayzhan211 Dec 2, 2023
bb2ea4b
add a summary table to benchmark compare output (#8399)
razeghi71 Dec 2, 2023
075ff3d
Refactors on TreeNode Implementations (#8395)
berkaysynnada Dec 2, 2023
f6af014
feat: support `LargeList` in `make_array` and `array_length` (#8121)
Weijun-H Dec 3, 2023
26196e6
remove `unalias()` TableScan filters when create Physical Filter (#8404)
jackwener Dec 3, 2023
e5a95b1
Update custom-table-providers.md (#8409)
nickpoorman Dec 4, 2023
a73be00
fix transforming `LogicalPlan::Explain` use `TreeNode::transform` fai…
haohuaijin Dec 4, 2023
4b4af65
Docs: Fix `array_except` documentation example (#8407)
Asura7969 Dec 4, 2023
37bbd66
Support named query parameters (#8384)
Asura7969 Dec 4, 2023
49dc1f2
Minor: Add installation link to README.md (#8389)
Weijun-H Dec 4, 2023
0bcf462
Update code comment for the cases of regularized RANGE frame and add …
viirya Dec 4, 2023
08fff2d
Minor: Add example with parameters to LogicalPlan (#8418)
alamb Dec 5, 2023
d1554c8
Minor: Improve `PruningPredicate` documentation (#8394)
alamb Dec 5, 2023
2e5ad7a
feat: ScalarValue from String (#8411)
QuenKar Dec 5, 2023
2d5f30e
Bump actions/labeler from 4.3.0 to 5.0.0 (#8422)
dependabot[bot] Dec 5, 2023
a3f34c9
Update sqlparser requirement from 0.39.0 to 0.40.0 (#8338)
dependabot[bot] Dec 5, 2023
6dd3c95
feat: support `LargeList` for `array_has`, `array_has_all` and `array…
Weijun-H Dec 5, 2023
4ceb2de
Union `schema` can't be a subset of the child schema (#8408)
jackwener Dec 5, 2023
e322839
Move `PartitionSearchMode` into datafusion_physical_plan, rename to `…
alamb Dec 5, 2023
c7a6965
Make filter selectivity for statistics configurable (#8243)
edmondop Dec 5, 2023
3ddd5eb
fix: Changed labeler.yml to latest format (#8431)
viirya Dec 6, 2023
fd92bcb
Minor: Use `ScalarValue::from` impl for strings (#8429)
alamb Dec 6, 2023
0d7cab0
Support crossjoin in substrait. (#8427)
my-vegetable-has-exploded Dec 6, 2023
eb08846
Fix ambiguous reference when aliasing in combination with `ORDER BY` …
Asura7969 Dec 6, 2023
4a46f31
Minor: convert marcro `list-slice` and `slice` to function (#8424)
Weijun-H Dec 6, 2023
107791a
Remove macro in iter_to_array for List (#8414)
jayzhan211 Dec 6, 2023
439339a
fix: Literal in `ORDER BY` window definition should not be an ordinal…
viirya Dec 6, 2023
fa8a0d9
feat: customize column default values for external tables (#8415)
jonahgao Dec 6, 2023
d9d8ddd
feat: Support `array_sort`(`list_sort`) (#8279)
Asura7969 Dec 6, 2023
99bf509
Bugfix: Remove df-cli specific SQL statment options before executing …
devinjdangelo Dec 6, 2023
c8e1c84
Detect when filters make subqueries scalar (#8312)
Jesse-Bakker Dec 6, 2023
33fc110
Add alias check to optimize projections merge (#8438)
mustafasrepo Dec 7, 2023
5e8b0e0
Fix PartialOrd for ScalarValue::List/FixSizeList/LargeList (#8253)
jayzhan211 Dec 7, 2023
6767ea3
Support parquet_metadata for datafusion-cli (#8413)
Veeupup Dec 7, 2023
d5dd535
Fix bug in optimizing a nested count (#8459)
Dandandan Dec 7, 2023
1aedf8d
Bump actions/setup-python from 4 to 5 (#8449)
dependabot[bot] Dec 7, 2023
9be9073
fix: ORDER BY window definition should work on null (#8444)
viirya Dec 7, 2023
c0c9e88
flx clippy warnings (#8455)
waynexia Dec 8, 2023
205e315
fix: RANGE frame for corner cases with empty ORDER BY clause should b…
viirya Dec 8, 2023
a8d74a7
Preserve `dict_id` on `Field` during serde roundtrip (#8457)
avantgardnerio Dec 8, 2023
e2986f1
support inter leave node (#8460)
liukun4515 Dec 8, 2023
ecb7c7d
Not fail when window input is empty record batch (#8466)
mustafasrepo Dec 8, 2023
3f6ff22
update cast (#8458)
Weijun-H Dec 8, 2023
d771f26
fix: don't unifies projection if expr is non-trival (#8454)
haohuaijin Dec 8, 2023
d43a70d
Minor: Add new bloom filter predicate tests (#8433)
alamb Dec 8, 2023
8f9d6e3
Add PRIMARY KEY Aggregate support to dataframe API (#8356)
mustafasrepo Dec 8, 2023
047fb33
Minor: refactor `data_trunc` to reduce duplicated code (#8430)
Weijun-H Dec 8, 2023
cd02c40
Support array_distinct function. (#8268)
my-vegetable-has-exploded Dec 8, 2023
34b0445
Add primary key support to stream table (#8467)
mustafasrepo Dec 8, 2023
91cc573
Add `evaluate_demo` and `range_analysis_demo` to Expr examples (#8377)
alamb Dec 8, 2023
ac4adfa
fix typo (#8473)
Weijun-H Dec 8, 2023
2765fee
Fix comment typo in table.rs: s/indentical/identical/ (#8469)
KeunwooLee-at Dec 8, 2023
182a37e
Remove `define_array_slice` and reuse `array_slice` for `array_pop_fr…
jayzhan211 Dec 9, 2023
62ee8fb
Minor: refactor `trim` to clean up duplicated code (#8434)
Weijun-H Dec 9, 2023
d091b55
Split `EmptyExec` into `PlaceholderRowExec` (#8446)
razeghi71 Dec 9, 2023
93b21bd
Enable non-uniform field type for structs created in DataFusion (#8463)
dlovell Dec 11, 2023
ff65dee
add multi ordering test case (#8439)
jayzhan211 Dec 11, 2023
391f301
Sort filenames when reading parquet to ensure consistent schema (#6629)
thomas-k-cameron Dec 11, 2023
1861c3d
Minor: Improve comments in EnforceDistribution tests (#8474)
alamb Dec 11, 2023
171a5fd
fix: support uppercase when parsing `Interval` (#8478)
QuenKar Dec 11, 2023
95ba48b
Better Equivalence (ordering and exact equivalence) Propagation throu…
mustafasrepo Dec 11, 2023
1154274
Add `today` alias for `current_date` (#8423)
smallzhongfeng Dec 12, 2023
2102275
remove useless clone (#8495)
Weijun-H Dec 12, 2023
7f312c8
fix: incorrect set preserve_partitioning in SortExec (#8485)
haohuaijin Dec 12, 2023
2919e32
Explicitly mark parquet for tests in datafusion-common (#8497)
Dennis40816 Dec 12, 2023
861cc36
Minor/Doc: Clarify DataFrame::write_table Documentation (#8519)
devinjdangelo Dec 12, 2023
500ab40
fix: Pull stats in `IdentVisitor`/`GraphvizVisitor` only when request…
vrongmeal Dec 12, 2023
4578f3d
Change display of RepartitionExec from SortPreservingRepartitionExec …
JacobOgle Dec 13, 2023
2bc67ef
Fix `DataFrame::cache` errors with `Plan("Mismatch between schema and…
Asura7969 Dec 13, 2023
0678a69
Minor: update pbjson_dependency (#8470)
alamb Dec 13, 2023
2e93f07
Minor: Update prost-derive dependency (#8471)
alamb Dec 13, 2023
9a322c8
Add write_table to dataframe actions in user guide (#8527)
devinjdangelo Dec 13, 2023
5bf80d6
Minor: Add repartition_file.slt end to end test for repartitioning fi…
alamb Dec 13, 2023
898911b
Prepare version 34.0.0 (#8508)
andygrove Dec 13, 2023
cf2de9b
refactor: use ExprBuilder to consume substrait expr and use macro to …
waynexia Dec 14, 2023
79c17e3
Make tests deterministic (#8525)
mustafasrepo Dec 14, 2023
5909866
fix: volatile expressions should not be target of common subexpt elim…
viirya Dec 14, 2023
831b2ba
Add LakeSoul to the list of Known Users (#8536)
xuchen-plus Dec 14, 2023
974d49c
Fix regression with Incorrect results when reading parquet files with…
alamb Dec 14, 2023
1042095
feat: improve string statistics display (#8535)
asimsedhain Dec 14, 2023
a971f1e
Defer file creation to write (#8539)
tustvold Dec 14, 2023
efa7b34
Minor: Improve error handling in sqllogictest runner (#8544)
alamb Dec 14, 2023
d67c0bb
Remove order_bys from AggregateExec state (#8537)
mustafasrepo Dec 14, 2023
06d3bcc
Fix count(null) and count(distinct null) (#8511)
joroKr21 Dec 14, 2023
5be8dbe
Minor: reduce code duplication in `date_bin_impl` (#8528)
Weijun-H Dec 14, 2023
72e39b8
Add metrics for UnnestExec (#8482)
simonvandel Dec 14, 2023
14c99b8
regenerate changelog (#8549)
andygrove Dec 14, 2023
5a24ec9
fix: make sure CASE WHEN pick first true branch when WHEN clause is t…
haohuaijin Dec 15, 2023
b457f2b
Minor: make SubqueryAlias::try_new take Arc<LogicalPlan> (#8542)
sadboy Dec 15, 2023
e1a9177
Fallback on null empty value in ExprBoundaries::try_from_column (#8501)
razeghi71 Dec 15, 2023
b276d47
Add test for DataFrame::write_table (#8531)
devinjdangelo Dec 15, 2023
28e7f60
Generate empty column at placeholder exec (#8553)
mustafasrepo Dec 15, 2023
f54eeea
Minor: Remove now dead SUPPORTED_STRUCT_TYPES (#8480)
alamb Dec 15, 2023
82235ae
[MINOR]: Add getter methods to first and last value (#8555)
mustafasrepo Dec 15, 2023
bf0073c
[MINOR]: Some code changes and a new empty batch guard for SHJ (#8557)
metesynnada Dec 15, 2023
b7fde3c
docs: update udf docs for udtf (#8546)
tshauck Dec 15, 2023
b71bec0
feat: implement Unary Expr in substrait (#8534)
waynexia Dec 15, 2023
0fcd077
Fix `compute_record_batch_statistics` wrong with `projection` (#8489)
Asura7969 Dec 16, 2023
1f4c14c
cleanup parquet flag (#8563)
jayzhan211 Dec 17, 2023
b59ddf6
Minor: move some invariants out of the loop (#8564)
haohuaijin Dec 17, 2023
0f83ffc
feat: implement Repartition plan in substrait (#8526)
waynexia Dec 17, 2023
2e16c75
Fix sort order aware file group parallelization (#8517)
alamb Dec 17, 2023
fc6cc48
feat: support largelist in array_slice (#8561)
Weijun-H Dec 17, 2023
b287cda
minor: fix to support scalars (#8559)
comphead Dec 18, 2023
a71a76a
refactor: `HashJoinStream` state machine (#8538)
korowa Dec 18, 2023
a1e959d
Remove ListingTable and FileScanConfig Unbounded (#8540) (#8573)
tustvold Dec 18, 2023
d65b51a
Update substrait requirement from 0.20.0 to 0.21.0 (#8574)
dependabot[bot] Dec 18, 2023
ceead1c
[minor]: Fix rank calculation bug when empty order by is seen (#8567)
mustafasrepo Dec 18, 2023
b5e94a6
Add `LiteralGuarantee` on columns to extract conditions required for …
alamb Dec 18, 2023
65b997b
[MINOR]: Parametrize sort-preservation tests to exercise all situatio…
mustafasrepo Dec 18, 2023
fc46b36
Minor: Add some comments to scalar_udf example (#8576)
alamb Dec 18, 2023
1935c58
Move Coercion for MakeArray to `coerce_arguments_for_signature` and i…
jayzhan211 Dec 18, 2023
d220bf4
support LargeList in array_positions (#8571)
Weijun-H Dec 18, 2023
d33ca4d
support LargeList in array_element (#8570)
Weijun-H Dec 18, 2023
9bc61b3
Increase test coverage for unbounded and bounded cases (#8581)
mustafasrepo Dec 19, 2023
f041e73
Port tests in `parquet.rs` to sqllogictest (#8560)
hiltontj Dec 19, 2023
b456cf7
Minor: avoid a copy in Expr::unalias (#8588)
alamb Dec 19, 2023
1bcaac4
Minor: support complex expr as the arg in the ApproxPercentileCont fu…
liukun4515 Dec 20, 2023
6f5230f
Bugfix: Add functional dependency check and aggregate try_new schema …
mustafasrepo Dec 20, 2023
8d72196
Remove GroupByOrderMode (#8593)
ozankabak Dec 20, 2023
b925b78
replace not-impl-err (#8589)
Weijun-H Dec 20, 2023
0e9c189
Substrait insubquery (#8363)
tgujar Dec 20, 2023
448e413
Minor: port last test from parquet.rs (#8587)
alamb Dec 20, 2023
778779f
Minor: consolidate map sqllogictest tests (#8550)
alamb Dec 20, 2023
98a5a4e
feat: support `LargeList` in `array_dims` (#8592)
Weijun-H Dec 20, 2023
bc013fc
Fix regression in regenerating protobuf source (#8603)
andygrove Dec 20, 2023
96c5b8a
Remove unbounded_input from FileSinkOptions (#8605)
devinjdangelo Dec 21, 2023
df806bd
Add `arrow_err!` macros, optional backtrace to ArrowError (#8586)
comphead Dec 21, 2023
fd121d3
Add examples of DataFrame::write* methods without S3 dependency (#8606)
devinjdangelo Dec 22, 2023
0ff5305
Implement logical plan serde for CopyTo (#8618)
andygrove Dec 22, 2023
55121d8
Fix InListExpr to return the correct number of rows (#8601)
alamb Dec 22, 2023
39e9f41
Remove ListingTable single_file option (#8604)
devinjdangelo Dec 22, 2023
ef34af8
support LargeList in array_remove (#8595)
Weijun-H Dec 22, 2023
0e62fa4
Rename `ParamValues::{LIST -> List,MAP -> Map}` (#8611)
kawadakk Dec 22, 2023
26a488d
Support binary temporal coercion for Date64 and Timestamp types
Asura7969 Dec 22, 2023
ba46434
Add new configuration item `listing_table_ignore_subdirectory` (#8565)
Asura7969 Dec 22, 2023
e467492
Optimize the parameter types of `ParamValues`'s methods (#8613)
kawadakk Dec 22, 2023
03c2ef4
Don't panic on zero placeholder in `ParamValues::get_placeholders_wit…
kawadakk Dec 22, 2023
df2e1e2
Fix #8507: Non-null sub-field on nullable struct-field has wrong null…
marvinlanhenke Dec 23, 2023
8524d58
Implement `contained` API in PruningPredicate (#8440)
alamb Dec 23, 2023
bf43bb2
Add partial serde support for ParquetWriterOptions (#8627)
andygrove Dec 23, 2023
7443f30
add arguments length check (#8622)
Weijun-H Dec 23, 2023
69e5382
Improve DataFrame functional tests (#8630)
alamb Dec 23, 2023
72af0ff
Improve regexp_match performance by avoiding cloning Regex (#8631)
viirya Dec 24, 2023
6b433a8
Minor: improve `listing_table_ignore_subdirectory` config documentati…
alamb Dec 24, 2023
d5704f7
Support Writing Arrow files (#8608)
devinjdangelo Dec 24, 2023
3698693
Filter pushdown into cross join (#8626)
mustafasrepo Dec 25, 2023
18c7566
[MINOR] Remove duplicate test utility and move one utility function f…
metesynnada Dec 25, 2023
ec8fd44
[MINOR]: Add new test for filter pushdown into cross join (#8648)
mustafasrepo Dec 25, 2023
e10d3e2
Rewrite bloom filters to use `contains` API (#8442)
alamb Dec 26, 2023
4e4d050
Split equivalence code into smaller modules. (#8649)
tushushu Dec 26, 2023
78832f1
Move parquet_schema.rs from sql to parquet tests (#8644)
alamb Dec 26, 2023
26a8000
Fix group by aliased expression in LogicalPLanBuilder::aggregate (#8629)
alamb Dec 26, 2023
58b0a2b
Refactor `array_union` and `array_intersect` functions to one general…
Weijun-H Dec 27, 2023
bb99d2a
Avoid extra clone in datafusion-proto::physical_plan (#8650)
ongchi Dec 27, 2023
28ca6d1
Minor: name some constant values in arrow writer, parquet writer (#8642)
alamb Dec 27, 2023
6403222
TreeNode Refactor Part 2 (#8653)
berkaysynnada Dec 27, 2023
1737d49
feat: support inlist in LiteralGurantee for pruning (#8654)
my-vegetable-has-exploded Dec 28, 2023
fba5cc0
Streaming CLI support (#8651)
berkaysynnada Dec 28, 2023
f39c040
Add serde support for CSV FileTypeWriterOptions (#8641)
andygrove Dec 28, 2023
b2cbc78
Add trait based ScalarUDF API (#8578)
alamb Dec 28, 2023
06ed3dd
Handle ordering of first last aggregation inside aggregator (#8662)
mustafasrepo Dec 28, 2023
8284371
feat: support 'LargeList' in `array_pop_front` and `array_pop_back` (…
Weijun-H Dec 28, 2023
673f0e1
chore: rename ceresdb to apache horaedb (#8674)
tanruixiang Dec 29, 2023
d515c68
clean code (#8671)
Weijun-H Dec 29, 2023
8ced56e
remove tz with modified offset from tests (#8677)
korowa Dec 29, 2023
b85a397
Make the BatchSerializer behind Arc to avoid unnecessary struct creat…
metesynnada Dec 29, 2023
7fc663c
Implement serde for CSV and Parquet FileSinkExec (#8646)
andygrove Dec 29, 2023
7f440e1
[pruning] Add shortcut when all units have been pruned (#8675)
Ted-Jiang Dec 30, 2023
bb98dfe
Change first/last implementation to prevent redundant comparisons whe…
mustafasrepo Dec 30, 2023
cc3042a
minor: remove unused conversion (#8684)
comphead Dec 30, 2023
00a679a
refactor: modified `JoinHashMap` build order for `HashJoinStream` (#8…
korowa Dec 30, 2023
545275b
Start setting up tpch planning benchmarks (#8665)
matthewmturner Dec 30, 2023
848f6c3
update doc (#8686)
devinjdangelo Dec 31, 2023
03bd9b4
Closes #8502: Parallel NDJSON file reading (#8659)
marvinlanhenke Dec 31, 2023
f0af5eb
init draft (#8625)
jayzhan211 Dec 31, 2023
bf3bd92
Cleanup TreeNode implementations (#8672)
viirya Jan 1, 2024
8ae7ddc
Update sqlparser requirement from 0.40.0 to 0.41.0 (#8647)
dependabot[bot] Jan 1, 2024
4dcfd7d
Update scalar functions doc for extract/datepart (#8682)
Jefffrey Jan 1, 2024
77c2180
Remove DescribeTableStmt in parser in favour of existing functionalit…
Jefffrey Jan 1, 2024
e82707e
feat: simplify null in list (#8691)
asimsedhain Jan 1, 2024
d2b3d1c
Rename `expr::window_function::WindowFunction` to `WindowFunctionDefi…
edmondop Jan 1, 2024
bf0a39a
Deprecate duplicate function `LogicalPlan::with_new_inputs` (#8707)
viirya Jan 2, 2024
f4233a9
Minor: refactor bloom filter tests to reduce duplication (#8435)
alamb Jan 2, 2024
82656af
clean up code (#8715)
Weijun-H Jan 2, 2024
94aff55
Update analyze.rs (#8717)
berkaysynnada Jan 2, 2024
d4b96a8
support LargeList in array_position (#8714)
Weijun-H Jan 2, 2024
96cede2
support LargeList in array_ndims (#8716)
Weijun-H Jan 2, 2024
c1fe3dd
feat: remove filters with null constants (#8700)
asimsedhain Jan 2, 2024
67baf10
support `LargeList` in `array_prepend` and `array_append` (#8679)
Weijun-H Jan 2, 2024
9a6cc88
Support for `extract(epoch from date)` for Date32 and Date64 (#8695)
Jefffrey Jan 2, 2024
6b1e9c6
Implement trait based API for defining WindowUDF (#8719)
guojidan Jan 3, 2024
1179a76
Minor: Introduce utils::hash for StructArray (#8552)
jayzhan211 Jan 3, 2024
93da699
[CI] Improve windows machine CI test time (#8730)
comphead Jan 3, 2024
ad4b7b7
fix guarantees in allways_true of PruningPredicate (#8732)
my-vegetable-has-exploded Jan 3, 2024
881d03f
[Minor] Avoid mem copy in generate window exprs (#8718)
Ted-Jiang Jan 3, 2024
ca260d9
support LargeList in array_repeat (#8725)
Weijun-H Jan 3, 2024
e6b9f52
ctrl+c termination (#8739)
berkaysynnada Jan 4, 2024
819d357
Add support for functional dependency for ROW_NUMBER window function.…
mustafasrepo Jan 4, 2024
e5036d0
Minor: reduce code duplication in PruningPredicate test (#8441)
alamb Jan 4, 2024
561d941
feat: native types in `DistinctCountAccumulator` for primitive types …
korowa Jan 5, 2024
05e3d45
Add test case for unnecessary hash when target is 1 (#8757)
mustafasrepo Jan 5, 2024
4173070
Minor: Improve `PruningPredicate` docstrings more (#8748)
alamb Jan 5, 2024
af20d2d
feat: support `LargeList` in `cardinality` (#8726)
Weijun-H Jan 5, 2024
93b4a4c
Add reproducer for #8738 (#8750)
alamb Jan 5, 2024
6e75297
Update checking columns in schema merge (#8765)
matthewmturner Jan 5, 2024
0208755
Minor: Add documentation about stream cancellation (#8747)
alamb Jan 5, 2024
29f23eb
Move `repartition_file_scans` out of `enable_round_robin` check in `…
viirya Jan 5, 2024
98f02ff
Clean internal implementation of WindowUDF (#8746)
guojidan Jan 5, 2024
b2e8848
feat: support `largelist` in `array_to_string` (#8729)
Weijun-H Jan 5, 2024
821db54
Error handling. (#8761)
metesynnada Jan 5, 2024
4e4059a
Convert Binary Operator `StringConcat` to Function for `array_concat`…
jayzhan211 Jan 5, 2024
7316274
Minor: Fix incorrect indices for hashing struct (#8775)
jayzhan211 Jan 6, 2024
4dd09f3
Minor: Improve library docs to mention TreeNode, ExprSimplifier, Prun…
alamb Jan 6, 2024
d6c891e
[MINOR] Add logo source files (#8762)
andygrove Jan 6, 2024
40c674b
Add Apache attribution to site footer (#8760)
alamb Jan 6, 2024
4289737
ci: speed up win64 test (#8728)
Jefffrey Jan 6, 2024
dd4263f
Add `schema_err!` error macros with optional backtrace (#8620)
comphead Jan 6, 2024
ff27d90
Fix regression by reverting Materialize dictionaries in group keys (#…
alamb Jan 8, 2024
cc42894
fix: struct field don't push down to TableScan (#8774)
haohuaijin Jan 8, 2024
746988a
Implement ScalarUDF in terms of ScalarUDFImpl trait (#8713)
alamb Jan 8, 2024
dd58c5a
Minor: Fix error messages in array expressions (#8781)
Weijun-H Jan 8, 2024
0e53c6d
Move tests from `expr.rs` to sqllogictests. Part1 (#8773)
comphead Jan 8, 2024
128b2c6
add config flag for recursive ctes
matthewgapp Jan 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ If there are user-facing changes then we may require documentation to be updated

<!--
If there are any breaking changes to public APIs, please add the `api change` label.
-->
-->
6 changes: 3 additions & 3 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Audit licenses
Expand All @@ -41,9 +41,9 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
with:
node-version: "14"
node-version: "20"
- name: Prettier check
run: |
# if you encounter error, rerun the command below and commit the changes
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dev_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
github.event_name == 'pull_request_target' &&
(github.event.action == 'opened' ||
github.event.action == 'synchronize')
uses: actions/labeler@v4.3.0
uses: actions/labeler@v5.0.0
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
configuration-path: .github/workflows/dev_pr/labeler.yml
Expand Down
34 changes: 18 additions & 16 deletions .github/workflows/dev_pr/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,35 +16,37 @@
# under the License.

development-process:
- dev/**.*
- .github/**.*
- ci/**.*
- .asf.yaml
- changed-files:
- any-glob-to-any-file: ['dev/**.*', '.github/**.*', 'ci/**.*', '.asf.yaml']

documentation:
- docs/**.*
- README.md
- ./**/README.md
- DEVELOPERS.md
- datafusion/docs/**.*
- changed-files:
- any-glob-to-any-file: ['docs/**.*', 'README.md', './**/README.md', 'DEVELOPERS.md', 'datafusion/docs/**.*']

sql:
- datafusion/sql/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/sql/**/*']

logical-expr:
- datafusion/expr/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/expr/**/*']

physical-expr:
- datafusion/physical-expr/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/physical-expr/**/*']

optimizer:
- datafusion/optimizer/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/optimizer/**/*']

core:
- datafusion/core/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/core/**/*']

substrait:
- datafusion/substrait/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/substrait/**/*']

sqllogictest:
- datafusion/sqllogictest/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/sqllogictest/**/*']
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
path: asf-site

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"

Expand Down
130 changes: 45 additions & 85 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,21 +47,30 @@ jobs:
image: amd64/rust
steps:
- uses: actions/checkout@v4
- name: Cache Cargo
uses: actions/cache@v3
with:
# these represent dependencies downloaded by cargo
# and thus do not depend on the OS, arch nor rust version.
path: /github/home/.cargo
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: stable

- name: Cache Cargo
uses: actions/cache@v3
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
./target/
./datafusion-cli/target/
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-benchmark-${{ hashFiles('datafusion/**/Cargo.toml', 'benchmarks/Cargo.toml', 'datafusion-cli/Cargo.toml') }}

- name: Check workspace without default features
run: cargo check --no-default-features -p datafusion

- name: Check datafusion-common without default features
run: cargo check --tests --no-default-features -p datafusion-common

- name: Check workspace in debug mode
run: cargo check

Expand All @@ -84,18 +93,20 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Run tests (excluding doctests)
run: cargo test --lib --tests --bins --features avro,json,backtrace
env:
# do not produce debug symbols to keep memory usage down
# hardcoding other profile params to avoid profile override values
# More on Cargo profiles https://doc.rust-lang.org/cargo/reference/profiles.html?profile-settings#profile-settings
RUSTFLAGS: "-C debuginfo=0 -C opt-level=0 -C incremental=false -C codegen-units=256"
RUST_BACKTRACE: "1"
# avoid rust stack overflows on tpc-ds tests
RUST_MIN_STACK: "3000000"
- name: Verify Working Directory Clean
run: git diff --exit-code

Expand All @@ -109,12 +120,6 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
Expand Down Expand Up @@ -145,19 +150,7 @@ jobs:
# test datafusion-sql examples
cargo run --example sql
# test datafusion-examples
cargo run --example avro_sql --features=datafusion/avro
cargo run --example csv_sql
cargo run --example custom_datasource
cargo run --example dataframe
cargo run --example dataframe_in_memory
cargo run --example deserialize_to_struct
cargo run --example expr_api
cargo run --example parquet_sql
cargo run --example parquet_sql_multiple_files
cargo run --example memtable
cargo run --example rewrite_expr
cargo run --example simple_udf
cargo run --example simple_udaf
ci/scripts/rust_example.sh
- name: Verify Working Directory Clean
run: git diff --exit-code

Expand Down Expand Up @@ -211,12 +204,6 @@ jobs:
image: amd64/rust
steps:
- uses: actions/checkout@v4
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
Expand All @@ -238,12 +225,6 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
Expand All @@ -259,7 +240,8 @@ jobs:
- name: Verify that benchmark queries return expected results
run: |
export TPCH_DATA=`realpath datafusion/sqllogictest/test_files/tpch/data`
cargo test serde_q --profile release-nonlto --features=ci -- --test-threads=1
# use release build for plan verificaton because debug build causes stack overflow
cargo test plan_q --package datafusion-benchmarks --profile release-nonlto --features=ci -- --test-threads=1
INCLUDE_TPCH=true cargo test --test sqllogictests
- name: Verify Working Directory Clean
run: git diff --exit-code
Expand Down Expand Up @@ -316,6 +298,7 @@ jobs:
# with a OS-dependent path.
- name: Setup Rust toolchain
run: |
rustup update stable
rustup toolchain install stable
rustup default stable
rustup component add rustfmt
Expand All @@ -327,10 +310,13 @@ jobs:
cd datafusion-cli
cargo test --lib --tests --bins --all-features
env:
# do not produce debug symbols to keep memory usage down
RUSTFLAGS: "-C debuginfo=0"
# Minimize producing debug symbols to keep memory usage down
# Set debuginfo=line-tables-only as debuginfo=0 causes immensely slow build
# See for more details: https://github.com/rust-lang/rust/issues/119560
RUSTFLAGS: "-C debuginfo=line-tables-only"
RUST_BACKTRACE: "1"

# avoid rust stack overflows on tpc-ds tests
RUST_MIN_STACK: "3000000"
macos:
name: cargo test (mac)
runs-on: macos-latest
Expand All @@ -353,6 +339,7 @@ jobs:
# with a OS-dependent path.
- name: Setup Rust toolchain
run: |
rustup update stable
rustup toolchain install stable
rustup default stable
rustup component add rustfmt
Expand All @@ -364,8 +351,12 @@ jobs:
cargo test --lib --tests --bins --all-features
env:
# do not produce debug symbols to keep memory usage down
RUSTFLAGS: "-C debuginfo=0"
# hardcoding other profile params to avoid profile override values
# More on Cargo profiles https://doc.rust-lang.org/cargo/reference/profiles.html?profile-settings#profile-settings
RUSTFLAGS: "-C debuginfo=0 -C opt-level=0 -C incremental=false -C codegen-units=256"
RUST_BACKTRACE: "1"
# avoid rust stack overflows on tpc-ds tests
RUST_MIN_STACK: "3000000"

test-datafusion-pyarrow:
name: cargo test pyarrow (amd64)
Expand All @@ -377,13 +368,7 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: "3.8"
- name: Install PyArrow
Expand Down Expand Up @@ -480,12 +465,6 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
Expand All @@ -506,12 +485,6 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
Expand All @@ -531,12 +504,6 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
Expand All @@ -546,12 +513,11 @@ jobs:

- name: Check Cargo.toml formatting
run: |
# if you encounter error, try rerun the command below, finally run 'git diff' to
# check which Cargo.toml introduces formatting violation
# if you encounter an error, try running 'cargo tomlfmt -p path/to/Cargo.toml' to fix the formatting automatically.
# If the error still persists, you need to manually edit the Cargo.toml file, which introduces formatting violation.
#
# ignore ./Cargo.toml because putting workspaces in multi-line lists make it easy to read
ci/scripts/rust_toml_fmt.sh
git diff --exit-code

config-docs-check:
name: check configs.md is up-to-date
Expand All @@ -563,19 +529,13 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
with:
node-version: "14"
node-version: "20"
- name: Check if configs.md has been modified
run: |
# If you encounter an error, run './dev/update_config_docs.sh' and commit
Expand Down
Loading