You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Design an API that help us support multiple DataFrame(polars,spark, pandas,etc) and convert them to the choosen processing engine native DataFrame API.
The text was updated successfully, but these errors were encountered:
The initial objective was to support multiple Dataframe APIs and a single backend for the execution of the query but from a UX point of view it doesn't make sense because if a user is using a different execution engine to process their data we would be forcing that user to use two backends just to use our library. Additionally, to support conversion between i.e polars and DataFusion we would need that both implement substrait format which is not the case and given recent conversation(see issue 7404) in the polars project it seems not be planned for the near future, nor Apache Spark(stuck pr) or Apache Flink support this format yet.
One alternative idea is to support multiple backends along with the frontend, meaning that if the user uses polars we would express and compute the metrics in polars and the like for each query engine or tool. This would require more work but will provide a better UX to the end user and reduce the complexity of implementation.
Design an API that help us support multiple DataFrame(polars,spark, pandas,etc) and convert them to the choosen processing engine native DataFrame API.
The text was updated successfully, but these errors were encountered: