[Enh]: Add support for database tables, views and queries #1289

MarcSkovMadsen · 2024-10-30T17:37:02Z

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

https://panel.holoviz.org/index.html

Please describe the purpose of the new feature or describe the problem to solve.

I'm a contributor to and user of Panel and HoloViz ecosystem. We plan to extend Panel dataframe support from Pandas Dataframe to Dataframe in general. Similarly what other data app frameworks are or will be doing.

To add support we will probably add a TabularData parameter to param: https://param.holoviz.org/user_guide/Parameter_Types.html.

We want start experimenting with this by adding the support to the package https://github.com/panel-extensions/panel-graphic-walker. I hope to do it via panel-extensions/panel-graphic-walker#22. Looking at what the library we wrap support we can see that a database Connector is also a tabular dataset. See https://github.com/Kanaries/pygwalker/tree/main/pygwalker/data_parsers and https://github.com/Kanaries/pygwalker/blob/d0d9acda42d4c11e5741ebb46b639afdd96d00e3/pygwalker/data_parsers/database_parser.py#L44. For them it consist of a sqlalchemy connectionstring and a SELECT query. To work with all databases (including azure synapse) it should be a sqlalchemy engine instead of a connectionstring.

Suggest a solution if possible.

Support a Database Query:

class DBQuery():
    engine
    sql: str

If you have tried alternatives, please describe them below.

Maybe Ibis support will provide this? But it is a wrapper with overhead. And many more people are familiar with SQLAlchemy than Ibis. Why not support SQLAlchemy queries directly?

Additional information that may help us understand your needs.

No response

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2024-11-02T19:35:55Z

hi @MarcSkovMadsen - thanks for the request, and good to hear from you!

For them it consist of a sqlalchemy connectionstring and a SELECT query. To work with all databases (including azure synapse) it should be a sqlalchemy engine instead of a connectionstring.

What operations would we need to support? Just selecting columns by name (via SELECT)? If so, then this is definitely in-scope, and we can add it to the interchange-only level of support https://narwhals-dev.github.io/narwhals/extending/#interchange-only-support

happy to chat more about this if you like

MarcSkovMadsen · 2024-11-03T04:03:22Z

It was great talking to you at PyData Paris too!

I want to share some thoughts on why I think adding database query support as a general tabular data source would be a valuable direction for Narwhals.

Narwhals as an Abstraction for Tabular Data

I see Narwhals as an abstraction layer for tabular data, enabling Python developers to:

Support larger communities (and potentially broader markets for commercial applications)
Simplify user experience for these communities
Provide performant, scalable solutions by offloading computations to underlying data engines whenever possible

Many libraries implement similar abstractions, like data backends in HoloViews. These abstractions allow libraries to focus on data interfaces rather than data loading or transforming details, freeing users to work with data seamlessly regardless of format or source.

In the data visualization and app ecosystem, a clear trend is emerging: frameworks are increasingly abstracting over media types. For example, Streamlit allows users to provide a simple Path to media, automatically handling file reading. Similarly, Narwhals could support file-like objects and URLs as data sources, allowing users to seamlessly integrate various formats without additional setup.

Gradio exemplifies this approach by organizing components based on media type, allowing users to input for example JSON as a str, dict, list, or callable. By adding support for file-like objects and URLs, Narwhals could extend similar flexibility to tabular data.

With Narwhals providing a unified backend for tabular data, frameworks could adopt it to improve and simplify their tabular data handling. Others could build similar backend abstractions for different media types (narwhals-json, narwhals-image, narwhals-audio, narwhals-video, etc.).

Goals and Vision

My primary goal is to support larger communities and make data handling easier. I would like to do this for:

All tabular components and extensions in Panel including Panel Tabulator and panel-graphic-walker.
HoloViews and hvPlot (I would not be the one doing this. But I hope others would pick it up).

I understand that interchange support could help achieve this. This would be a solid starting point, but ideally, I'd like the applications I work on to be as performant and scalable as possible. For a database-backed tabular data source, this means offloading computations directly to the database.

For example:

To replace the HoloViews backend, we’d need to implement most methods from the HoloViews Interface.
To support GraphicWalker, necessary computations for any tabular data source are described in the Graphic Walker Computations guide and implemented in PygWalker data_parsers.
For general tabular data visualization, other frameworks like VegaFusion, Perspective, Mosaic, and Quack outline necessary computations. Ideally, these frameworks could rely on Narwhals to support diverse tabular data backends directly.

Adding database query support would help Narwhals offer a seamless, efficient experience across many platforms and use cases.

To be more specific I would have to analyze the above. But interchange is a good starting point.

jonmmease · 2024-11-10T00:32:20Z

@MarcSkovMadsen, do I understand correctly that you're proposing that Narwhals directly take on the Ibis use case?

MarcSkovMadsen · 2024-11-10T02:16:52Z

@MarcSkovMadsen, do I understand correctly that you're proposing that Narwhals directly take on the Ibis use case?

Its a good question. I hope not. I can't really see through all of this.

What I believe I can see is that the current goals of Narwhals is not enough for someone trying to make it easy for for users to utilize tabular data in their projects. I believe we want to enable users to use DataFrames, SELECT queries in databases, local parquet files, remote parquet files and probably more. I.e. it means that library maintainers probably still need to combine narwhals with other libraries (ibis?) and also create support for other sources them selves.

Maybe the problem is already solved. Maybe I just need better how to integrate narwhals and ibis? Maybe improved docs explaining to library developers that support for tabular data from databases goes via Ibis and that narwhals has a long term goal of making that as performant as possible. The downside of ibis is that then you suddenly have to include both narwhals and ibis as dependencies.

MarcSkovMadsen · 2024-11-10T07:02:17Z

Having studied narwhals, ibis and the problem I'm trying to solve a bit more (for example) I think what I'm asking for is probably very much aligned with the road map:

Add extra docs and tutorials to make the project more accessible and easy to get started with.
Define a lazy-only layer of support which can include DuckDB, Ibis, and PySpark.

Documenting better how to use tabular data sources that are not DataFrames in origin via Ibis would help me a lot. Including making it clear that its a design decision not to implement your own deep support for databases

Defining a lazy-only layer of support for Ibis would help push computations to the underlying backend

Then I can hope and ask library builders like pygwalker to support Narwhals and Ibis instead of building their own custom Connector and DatabaseDataParser.

Then there is a detail of me wanting to enable users to just give file paths and urls instead of dataframes as input. Because that is where I see data app frameworks going. But that is a detail.

I think I will close this one. Thanks so much for the discussion.

MarcSkovMadsen closed this as completed Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enh]: Add support for database tables, views and queries #1289

[Enh]: Add support for database tables, views and queries #1289

MarcSkovMadsen commented Oct 30, 2024 •

edited

Loading

MarcoGorelli commented Nov 2, 2024

MarcSkovMadsen commented Nov 3, 2024 •

edited

Loading

jonmmease commented Nov 10, 2024

MarcSkovMadsen commented Nov 10, 2024 •

edited

Loading

MarcSkovMadsen commented Nov 10, 2024 •

edited

Loading

[Enh]: Add support for database tables, views and queries #1289

[Enh]: Add support for database tables, views and queries #1289

Comments

MarcSkovMadsen commented Oct 30, 2024 • edited Loading

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

Please describe the purpose of the new feature or describe the problem to solve.

Suggest a solution if possible.

If you have tried alternatives, please describe them below.

Additional information that may help us understand your needs.

MarcoGorelli commented Nov 2, 2024

MarcSkovMadsen commented Nov 3, 2024 • edited Loading

Narwhals as an Abstraction for Tabular Data

Goals and Vision

jonmmease commented Nov 10, 2024

MarcSkovMadsen commented Nov 10, 2024 • edited Loading

MarcSkovMadsen commented Nov 10, 2024 • edited Loading

MarcSkovMadsen commented Oct 30, 2024 •

edited

Loading

MarcSkovMadsen commented Nov 3, 2024 •

edited

Loading

MarcSkovMadsen commented Nov 10, 2024 •

edited

Loading

MarcSkovMadsen commented Nov 10, 2024 •

edited

Loading