-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable recursive CTE support by default #9554
Labels
enhancement
New feature or request
Comments
This was referenced Mar 11, 2024
it's a bit slow.
duckdb 0.10 :
hyper 0.0.18161:
|
Disabling DataFusion CLI v36.0.0
❯ set datafusion.optimizer.enable_round_robin_repartition=false;
0 rows in set. Query took 0.002 seconds.
❯ set datafusion.execution.coalesce_batches=false;
0 rows in set. Query took 0.002 seconds.
❯ with recursive t(a) as
(select 1 as a
union all
select 1+a from t where a<10000)
select count(*) from t;
+----------+
| COUNT(*) |
+----------+
| 10000 |
+----------+
1 row in set. Query took 0.194 seconds. |
Looks like there is a draft PR here addressing some of the performance issues / improvements: matthewgapp#2 |
take |
I am interested in implementing this memory limit feature. |
Thank you @jonahgao 🙏 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem or challenge?
As part of #462, @matthewgapp implemented support for recursive Common Table Expressions (aka Recursive CTEs)
Here is an example of such a query:
https://github.com/apache/arrow-datafusion/blob/4cd3c433004a7a6825643d6b3911db720efe5f76/datafusion/sqllogictest/test_files/cte.slt#L44-L68
At the moment, to use recursive CTEs you must enable a config option:
Describe the solution you'd like
I would like recursive CTEs to be enabled by default (and thus useable without a config option)
The only reason I know of at the moment that they are NOT enabled by default is because they might buffer an infinite amount of data (and thus exceed the total memory available to DataFusion)
Describe alternatives you've considered
I think the basic idea would be to
RecursiveQueryExec
with aMemoryReservation
memory_limit.rs
showing the limit being hit: https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/memory_limit.rsThe main PR that added this feature was #8840
Additional context
No response
The text was updated successfully, but these errors were encountered: