You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Currently DataFusion operators communicate via a narrow API i.e. forwarding SendableRecordBatchStreams. In some instances, in particular the ExecutionPlans operating on unbounded streams need to snapshot their state and co-ordinate with source operators. It'd be a powerful primitive to add a StateBackend concept to the RuntimeEnv where users could then write operators to store adhoc durable state into a backend such as rocksdb.
Realise this may not be useful for many use cases but RuntimeEnv does seem to have ability to plug in an object store registry as well as a catalog manager. This would be a crucial unlock to make stateful stream processing application with DataFusion.
If the current, API contains such a pathway already, would love to get pointers in the right direction.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
We actually do something like this in our use of DF for stream processing. Since it could remain unused/irrelevant for cases other than stream processing, it may be an overfit to add something like this to upstream DF. Let's see if there are other use cases for something like this. If we discover more use cases, maybe it could make sense to add something like this.
makes sense @ozankabak , StateBackend trait seems like it could be of use to more operators than just streaming was what we were thinking, especially the operators that spill to disk. Is there a way to make the state backend a part of the SessionState without needing this change or use an existing feature?
Even aside from the specifics of StateBackend, it seems like RuntimeEnv could benefit from more flexibility.
Perhaps we could add extension functionality via a trait to the RuntimeEnv that would allow users to register custom structs that could then be downcast by any custom operator that need them?
Is your feature request related to a problem or challenge?
Currently DataFusion operators communicate via a narrow API i.e. forwarding
SendableRecordBatchStreams
. In some instances, in particular the ExecutionPlans operating on unbounded streams need to snapshot their state and co-ordinate with source operators. It'd be a powerful primitive to add aStateBackend
concept to theRuntimeEnv
where users could then write operators to store adhoc durable state into a backend such as rocksdb.Realise this may not be useful for many use cases but RuntimeEnv does seem to have ability to plug in an object store registry as well as a catalog manager. This would be a crucial unlock to make stateful stream processing application with DataFusion.
If the current, API contains such a pathway already, would love to get pointers in the right direction.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: