-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(snowflake): run ibis code inside of stored procedure #5877
Comments
Hey @malthe 👋🏻! Thanks for the issue. Is your desire to run ibis code inside of a Python procedure, all tied together from an ibis session? Can you express in pseudocode the kind workflow you want? It sounds like it's something like this, but please correct me if I'm wrong: >>> # assume we can do something like this with the current snowflake backend
>>> con = ibis.snowflake.connect(...)
>>> stored_procedure = """
... import ibis
... # do ibis things? 🙃
... """
>>> # not sure if ☝🏻 is a string or some other kind of object
>>> func = con.register(stored_procedure) # not sure about this API at all
>>> t = con.tables.my_table
>>> res = func(t.some_column) # ? |
@malthe Would it make sense to chat on our gitter channel? https://gitter.im/ibis-dev/Lobby It might be faster/easier to get to understanding there. |
Using these Snowpark API-enabled stored procedures (created for example using the sproc decorator), we can run self-contained analytical models which pull data on their own accord using the session object. The workflow could be that you create a stored procedure using just the regular Snowflake Python connector library – perhaps in a CI/CD pipeline – and use Ibis from within that stored procedure (i.e. Python function): @sproc(packages=["ibis-framework", "snowflake-snowpark-python"])
def my_proc(session: "snowflake.snowpark.Session"):
# Using "connect" here feels weird, but nonetheless ...
backend = ibis.snowpark.connect(session)
...
# Use `backend` as you would any other backend object. |
Ok, so here's the summary of the use case and some issues that need input from To clarify, @malthe this isn't directed at you in any way. The issues I'm discussing below are all about collaborating with the Snowpark team! Use caseThe goal is to be able to author ibis expressions inside of a stored procedure defined in a Snowpark session. There are many challenges here that we need help from snowflake to tackle. What the Ibis team would need from Snowflake
Happy PathThere are still some engineering questions on the ibis side assuming the above can be done
Less Happy PathI really would like to avoid rebuilding any of what snowpark has done, but if there's no way for us to collaborate effectively on making end users happy we'll need to think about how to solve this problem
Again, I would 100% like to avoid any of that. I hope we can start collaborating on this! It's super exciting!! |
We would also be happy to hear about any concrete asks for ibis from the Snowpark team! |
@cpcloud thanks for letting us know. I want to understand what you are trying to do. The gist is that ibis is trying to add Snowflake as a backend and you are considering various options, correct? a) Use SQL directly and use the snowflake python connector to send this SQL over to execute over on Snowflake. Both of these approaches seem fine to us so if there are issues that are blocking you from making progress, please file them in the respective repo. We do actively monitor these issues and will prioritize them along with our internal roadmap. |
No, we already have a very functional high operation coverage Snowflake backend based on We've got a Streamlit app in our documentation that shows coverage, and our Snowflake backend is one of the highest coverage backends.
As said at the beginning of my long comment above:
It seems like we can probably do all of this without Snowpark at all, and just use the underlying connection to construct an ibis backend. |
@sfc-gh-achandrasekaran IIUC, for Snowflake customers to be able to run Ibis code inside their UDFs and stored procedures, |
@malthe Quick update on this. After #6156, you'll have the ability to create an ibis backend from a snowpark session. We test the entire ibis backend test suite against a backend created this way, so you can be confident that there's no obvious differences between connecting with a URL/parameters and connecting from a snowpark session. Unfortunately this is only half the story for using ibis inside of a snowpark stored procedure. There are issues that need to be addressed on the snowflake/snowpark side of things.
|
@malthe A related question: does your desired workflow involve mixing snowpark session code with ibis code? One tricky bit with #6156 is that the bind parameter syntax is different between snowpark and snowflake-sqlalchemy (I don't know why exactly) and there isn't an obvious sane workaround to avoid mutating the snowpark connection to make it work with ibis. This means that without any further changes to #6156 you can't use the existing snowpark session for anything that involves bind parameters after you call I think I can find a suitable workaround, but I also don't want to spend a bunch of time on that if it's not a problem that you expect to have. |
Just a quick update saying that I have opened up a so-called idea on the Snowflake Ideas website. It's not possible to link directly to the suggestion to include snowflake-sqlalchemy, but at least now people can search it up and upvote it. |
Update on this: Unfortunately, it was too difficult to get connections from snowflake to play nicely as pure DBAPI implementations for use with snowflake-sqlalchemy. However, we're likely to move away from snowflake-alchemy soon, which may make it easier to implement this feature. |
UPDATE: 9.0 will ship with a I'll investigate adding a test to verify the explicit behavior of running Ibis code inside of a stored proc. |
I've got a test showing use of Ibis inside of a Snowpark stored procedure here: #8971. |
Is your feature request related to a problem?
Writing stored procedures for Snowflake using Python depends on the Snowpark API, centering around a session object rather than a database connection.
Describe the solution you'd like
To use Ibis for data wrangling within a Snowflake stored procedure, a new backend is required. It would presumably use
Session.sql
as the interface between SQLAlchemy-based expressions and Snowpark API (which has its own query builder).What version of ibis are you running?
5.0.0
What backend(s) are you using, if any?
Snowflake
Code of Conduct
The text was updated successfully, but these errors were encountered: