Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enh]: Add support for database tables, views and queries #1289

Closed
MarcSkovMadsen opened this issue Oct 30, 2024 · 5 comments
Closed

[Enh]: Add support for database tables, views and queries #1289

MarcSkovMadsen opened this issue Oct 30, 2024 · 5 comments

Comments

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Oct 30, 2024

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

https://panel.holoviz.org/index.html

Please describe the purpose of the new feature or describe the problem to solve.

I'm a contributor to and user of Panel and HoloViz ecosystem. We plan to extend Panel dataframe support from Pandas Dataframe to Dataframe in general. Similarly what other data app frameworks are or will be doing.

To add support we will probably add a TabularData parameter to param: https://param.holoviz.org/user_guide/Parameter_Types.html.

image

We want start experimenting with this by adding the support to the package https://github.com/panel-extensions/panel-graphic-walker. I hope to do it via panel-extensions/panel-graphic-walker#22. Looking at what the library we wrap support we can see that a database Connector is also a tabular dataset. See https://github.com/Kanaries/pygwalker/tree/main/pygwalker/data_parsers and https://github.com/Kanaries/pygwalker/blob/d0d9acda42d4c11e5741ebb46b639afdd96d00e3/pygwalker/data_parsers/database_parser.py#L44. For them it consist of a sqlalchemy connectionstring and a SELECT query. To work with all databases (including azure synapse) it should be a sqlalchemy engine instead of a connectionstring.

Suggest a solution if possible.

Support a Database Query:

class DBQuery():
    engine
    sql: str

If you have tried alternatives, please describe them below.

Maybe Ibis support will provide this? But it is a wrapper with overhead. And many more people are familiar with SQLAlchemy than Ibis. Why not support SQLAlchemy queries directly?

Additional information that may help us understand your needs.

No response

@MarcoGorelli
Copy link
Member

hi @MarcSkovMadsen - thanks for the request, and good to hear from you!

For them it consist of a sqlalchemy connectionstring and a SELECT query. To work with all databases (including azure synapse) it should be a sqlalchemy engine instead of a connectionstring.

What operations would we need to support? Just selecting columns by name (via SELECT)? If so, then this is definitely in-scope, and we can add it to the interchange-only level of support https://narwhals-dev.github.io/narwhals/extending/#interchange-only-support

happy to chat more about this if you like

@MarcSkovMadsen
Copy link
Author

MarcSkovMadsen commented Nov 3, 2024

It was great talking to you at PyData Paris too!

I want to share some thoughts on why I think adding database query support as a general tabular data source would be a valuable direction for Narwhals.

Narwhals as an Abstraction for Tabular Data

I see Narwhals as an abstraction layer for tabular data, enabling Python developers to:

  • Support larger communities (and potentially broader markets for commercial applications)
  • Simplify user experience for these communities
  • Provide performant, scalable solutions by offloading computations to underlying data engines whenever possible

Many libraries implement similar abstractions, like data backends in HoloViews. These abstractions allow libraries to focus on data interfaces rather than data loading or transforming details, freeing users to work with data seamlessly regardless of format or source.

image

In the data visualization and app ecosystem, a clear trend is emerging: frameworks are increasingly abstracting over media types. For example, Streamlit allows users to provide a simple Path to media, automatically handling file reading. Similarly, Narwhals could support file-like objects and URLs as data sources, allowing users to seamlessly integrate various formats without additional setup.

Gradio exemplifies this approach by organizing components based on media type, allowing users to input for example JSON as a str, dict, list, or callable. By adding support for file-like objects and URLs, Narwhals could extend similar flexibility to tabular data.

With Narwhals providing a unified backend for tabular data, frameworks could adopt it to improve and simplify their tabular data handling. Others could build similar backend abstractions for different media types (narwhals-json, narwhals-image, narwhals-audio, narwhals-video, etc.).

Goals and Vision

My primary goal is to support larger communities and make data handling easier. I would like to do this for:

I understand that interchange support could help achieve this. This would be a solid starting point, but ideally, I'd like the applications I work on to be as performant and scalable as possible. For a database-backed tabular data source, this means offloading computations directly to the database.

For example:

Adding database query support would help Narwhals offer a seamless, efficient experience across many platforms and use cases.


To be more specific I would have to analyze the above. But interchange is a good starting point.

@jonmmease
Copy link

@MarcSkovMadsen, do I understand correctly that you're proposing that Narwhals directly take on the Ibis use case?

@MarcSkovMadsen
Copy link
Author

MarcSkovMadsen commented Nov 10, 2024

@MarcSkovMadsen, do I understand correctly that you're proposing that Narwhals directly take on the Ibis use case?

Its a good question. I hope not. I can't really see through all of this.

What I believe I can see is that the current goals of Narwhals is not enough for someone trying to make it easy for for users to utilize tabular data in their projects. I believe we want to enable users to use DataFrames, SELECT queries in databases, local parquet files, remote parquet files and probably more. I.e. it means that library maintainers probably still need to combine narwhals with other libraries (ibis?) and also create support for other sources them selves.

Maybe the problem is already solved. Maybe I just need better how to integrate narwhals and ibis? Maybe improved docs explaining to library developers that support for tabular data from databases goes via Ibis and that narwhals has a long term goal of making that as performant as possible. The downside of ibis is that then you suddenly have to include both narwhals and ibis as dependencies.

@MarcSkovMadsen
Copy link
Author

MarcSkovMadsen commented Nov 10, 2024

Having studied narwhals, ibis and the problem I'm trying to solve a bit more (for example) I think what I'm asking for is probably very much aligned with the road map:

  • Add extra docs and tutorials to make the project more accessible and easy to get started with.
  • Define a lazy-only layer of support which can include DuckDB, Ibis, and PySpark.

Documenting better how to use tabular data sources that are not DataFrames in origin via Ibis would help me a lot. Including making it clear that its a design decision not to implement your own deep support for databases

Defining a lazy-only layer of support for Ibis would help push computations to the underlying backend

Then I can hope and ask library builders like pygwalker to support Narwhals and Ibis instead of building their own custom Connector and DatabaseDataParser.

Then there is a detail of me wanting to enable users to just give file paths and urls instead of dataframes as input. Because that is where I see data app frameworks going. But that is a detail.

I think I will close this one. Thanks so much for the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants