Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(snowflake): run ibis code inside of stored procedure #5877

Closed
1 task done
malthe opened this issue Mar 30, 2023 · 14 comments · Fixed by #8971
Closed
1 task done

feat(snowflake): run ibis code inside of stored procedure #5877

malthe opened this issue Mar 30, 2023 · 14 comments · Fixed by #8971
Labels
feature Features or general enhancements snowflake The Snowflake backend

Comments

@malthe
Copy link
Contributor

malthe commented Mar 30, 2023

Is your feature request related to a problem?

Writing stored procedures for Snowflake using Python depends on the Snowpark API, centering around a session object rather than a database connection.

Describe the solution you'd like

To use Ibis for data wrangling within a Snowflake stored procedure, a new backend is required. It would presumably use Session.sql as the interface between SQLAlchemy-based expressions and Snowpark API (which has its own query builder).

What version of ibis are you running?

5.0.0

What backend(s) are you using, if any?

Snowflake

Code of Conduct

  • I agree to follow this project's Code of Conduct
@malthe malthe added the feature Features or general enhancements label Mar 30, 2023
@cpcloud
Copy link
Member

cpcloud commented Mar 30, 2023

Hey @malthe 👋🏻!

Thanks for the issue.

Is your desire to run ibis code inside of a Python procedure, all tied together from an ibis session?

Can you express in pseudocode the kind workflow you want?

It sounds like it's something like this, but please correct me if I'm wrong:

>>> # assume we can do something like this with the current snowflake backend
>>> con = ibis.snowflake.connect(...)
>>> stored_procedure = """
... import ibis
... # do ibis things? 🙃
... """
>>> # not sure if ☝🏻 is a string or some other kind of object
>>> func = con.register(stored_procedure) # not sure about this API at all
>>> t = con.tables.my_table
>>> res = func(t.some_column)  # ?

@cpcloud
Copy link
Member

cpcloud commented Mar 30, 2023

@malthe Would it make sense to chat on our gitter channel? https://gitter.im/ibis-dev/Lobby

It might be faster/easier to get to understanding there.

@malthe
Copy link
Contributor Author

malthe commented Mar 30, 2023

Using these Snowpark API-enabled stored procedures (created for example using the sproc decorator), we can run self-contained analytical models which pull data on their own accord using the session object.

The workflow could be that you create a stored procedure using just the regular Snowflake Python connector library – perhaps in a CI/CD pipeline – and use Ibis from within that stored procedure (i.e. Python function):

@sproc(packages=["ibis-framework", "snowflake-snowpark-python"])
def my_proc(session: "snowflake.snowpark.Session"):
    # Using "connect" here feels weird, but nonetheless ...
    backend = ibis.snowpark.connect(session)
    ...
    # Use `backend` as you would any other backend object.

@cpcloud
Copy link
Member

cpcloud commented Mar 31, 2023

Ok, so here's the summary of the use case and some issues that need input from
someone from the Snowpark team.

To clarify, @malthe this isn't directed at you in any way.

The issues I'm discussing below are all about collaborating with the Snowpark team!

Use case

The goal is to be able to author ibis expressions inside of a stored procedure defined in a Snowpark session.

There are many challenges here that we need help from snowflake to tackle.

What the Ibis team would need from Snowflake

  1. Ibis supports Python 3.8-3.11; we would like it if the various snowflake libraries did too
  2. Support nested data deserialization using arrow; getting result sets to the client has sub-optimal performance
  3. Support more than one version of arrow
  4. Collaboration to quickly resolve issues that come up during development and testing

Happy Path

There are still some engineering questions on the ibis side assuming the above can be done

  1. Should we reuse the pyspark compiler against a snowpark backend or generate
    SQL and hand that over to snowpark?
  2. Should we keep the snowflake backend around, or just cut over to snowpark? I'd prefer not to have two Snowflake backends.

Less Happy Path

I really would like to avoid rebuilding any of what snowpark has done, but if there's no way for us to collaborate effectively on making end users happy we'll need to think about how to solve this problem

  1. Can we support stored procs without snowpark?
  2. If so, we'll probably still need a way to construct a snowflake backend instance from a snowpark session, is that possible/sane?

Again, I would 100% like to avoid any of that.

I hope we can start collaborating on this! It's super exciting!!

@cpcloud
Copy link
Member

cpcloud commented Mar 31, 2023

We would also be happy to hear about any concrete asks for ibis from the Snowpark team!

@sfc-gh-achandrasekaran
Copy link

sfc-gh-achandrasekaran commented Mar 31, 2023

@cpcloud thanks for letting us know. I want to understand what you are trying to do. The gist is that ibis is trying to add Snowflake as a backend and you are considering various options, correct?

a) Use SQL directly and use the snowflake python connector to send this SQL over to execute over on Snowflake.
b) Use dataframe semantics and therefore, use Snowpark to do the dataframe math for you.

Both of these approaches seem fine to us so if there are issues that are blocking you from making progress, please file them in the respective repo. We do actively monitor these issues and will prioritize them along with our internal roadmap.

@cpcloud
Copy link
Member

cpcloud commented Apr 1, 2023

@cpcloud thanks for letting us know. I want to understand what you are trying to do. The gist is that ibis is trying to add Snowflake as a backend and you are considering various options, correct?

No, we already have a very functional high operation coverage Snowflake backend based on snowflake-sqlalchemy.

We've got a Streamlit app in our documentation that shows coverage, and our Snowflake backend is one of the highest coverage backends.

a) Use SQL directly and use the snowflake python connector to send this SQL over to execute over on Snowflake. b) Use dataframe semantics and therefore, use Snowpark to do the dataframe math for you.

Both of these approaches seem fine to us so if there are issues that are blocking you from making progress, please file them in the respective repo. We do actively monitor these issues and will prioritize them along with our internal roadmap.

As said at the beginning of my long comment above:

Use case

The goal is to be able to author ibis expressions inside of a stored procedure
defined in a Snowpark session.

It seems like we can probably do all of this without Snowpark at all, and just use the underlying connection to construct an ibis backend.

@cpcloud cpcloud changed the title feat: Snowpark backend feat(snowflake): run ibis code inside of stored procedure Apr 4, 2023
@cpcloud cpcloud added this to the 6.0 milestone Apr 4, 2023
@cpcloud
Copy link
Member

cpcloud commented Apr 6, 2023

@sfc-gh-achandrasekaran IIUC, for Snowflake customers to be able to run Ibis code inside their UDFs and stored procedures, ibis-framework needs to be added to the PACKAGES configuration value, but it looks like ibis-framework isn't available upstream currently. I'm happy to help if the Snowflake team needs any assistance making it available.

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2023

@malthe Quick update on this.

After #6156, you'll have the ability to create an ibis backend from a snowpark session. We test the entire ibis backend test suite against a backend created this way, so you can be confident that there's no obvious differences between connecting with a URL/parameters and connecting from a snowpark session.

Unfortunately this is only half the story for using ibis inside of a snowpark stored procedure. There are issues that need to be addressed on the snowflake/snowpark side of things.

  1. At the very least, snowflake-sqlalchemy needs to exist in the snowflake conda channel. Snowflake uses implicit namespace packages for their Python libraries, and this isn't compatible with their snowcli packaging mechanism. End users would still need to package up all the rest of ibis's required dependencies that don't exist on the snowflake conda channel.
  2. Ideally, ibis-framework (and its dependencies) would simply exist on their conda channel. This would still require snowflake-sqlalchemy to exist, but would not place the burden of packaging and uploading third party dependencies using on end users.

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2023

@malthe A related question: does your desired workflow involve mixing snowpark session code with ibis code?

One tricky bit with #6156 is that the bind parameter syntax is different between snowpark and snowflake-sqlalchemy (I don't know why exactly) and there isn't an obvious sane workaround to avoid mutating the snowpark connection to make it work with ibis.

This means that without any further changes to #6156 you can't use the existing snowpark session for anything that involves bind parameters after you call ibis.snowflake.from_snowpark.

I think I can find a suitable workaround, but I also don't want to spend a bunch of time on that if it's not a problem that you expect to have.

@cpcloud cpcloud added the snowflake The Snowflake backend label Jun 12, 2023
@malthe
Copy link
Contributor Author

malthe commented Jul 1, 2023

Just a quick update saying that I have opened up a so-called idea on the Snowflake Ideas website. It's not possible to link directly to the suggestion to include snowflake-sqlalchemy, but at least now people can search it up and upvote it.

@cpcloud
Copy link
Member

cpcloud commented Aug 1, 2023

Update on this: Unfortunately, it was too difficult to get connections from snowflake to play nicely as pure DBAPI implementations for use with snowflake-sqlalchemy.

However, we're likely to move away from snowflake-alchemy soon, which may make it easier to implement this feature.

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

UPDATE: 9.0 will ship with a ibis.snowflake.from_snowpark(session) API. We run the entire Ibis test suite against a backend created this way, so it should be possible to write Ibis code inside of a Snowpark stored procedure.

I'll investigate adding a test to verify the explicit behavior of running Ibis code inside of a stored proc.

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

I've got a test showing use of Ibis inside of a Snowpark stored procedure here: #8971.

kszucs pushed a commit that referenced this issue Apr 16, 2024
…red procedure (#8971)

Add a test demonstrating running Ibis inside of a Snowpark stored
procedure.

Closes #5877.
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements snowflake The Snowflake backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants