Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trino and Presto hooks do not properly execute statements other than SELECT #26774

Closed
2 tasks done
alexandermalyga opened this issue Sep 29, 2022 · 3 comments · Fixed by #27168
Closed
2 tasks done

Trino and Presto hooks do not properly execute statements other than SELECT #26774

alexandermalyga opened this issue Sep 29, 2022 · 3 comments · Fixed by #27168

Comments

@alexandermalyga
Copy link
Contributor

alexandermalyga commented Sep 29, 2022

Apache Airflow Provider(s)

presto, trino

Versions of Apache Airflow Providers

apache-airflow-providers-trino==4.0.1
apache-airflow-providers-presto==4.0.1

Apache Airflow version

2.4.0

Operating System

macOS 12.6

Deployment

Docker-Compose

Deployment details

No response

What happened

When using the TrinoHook (PrestoHook also applies), only the get_records() and get_first() methods work as expected, the run() and insert_rows() do not.
The SQL statements sent by the problematic methods reach the database (visible in logs and UI), but they don't get executed.

The issue is caused by the hook not making the required subsequent requests to the Trino HTTP endpoints after the first request. More info here:

If the JSON document returned by the POST to /v1/statement does not contain a nextUri link, the query has completed, either successfully or unsuccessfully, and no additional requests need to be made. If the nextUri link is present in the document, there are more query results to be fetched. The client should loop executing a GET request to the nextUri returned in the QueryResults response object until nextUri is absent from the response.

SQL statements made by methods like get_records() do get executed because internally they call fetchone() or fetchmany() on the cursor, which do make the subsequent requests.

What you think should happen instead

The Hook is able to execute SQL statements other than SELECT.

How to reproduce

Connect to a Trino or Presto instance and execute any SQL statement (INSERT or CREATE TABLE) using TrinoHook.run(), the statements will reach the API but they won't get executed.

Then, provide a dummy handler function like this:
TrinoHook.run(..., handler=lambda cur: cur.description)

The description property internally iterates over the cursor requests, causing the statement getting executed.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@alexandermalyga alexandermalyga added area:providers kind:bug This is a clearly a bug labels Sep 29, 2022
@alexandermalyga
Copy link
Contributor Author

Not sure about the approach here. cur.fetchall() could be added at the end of run() and insert_rows() but that would require copy-pasting DbApiHook's implementation and adding one line, which is not ideal. Any thoughts @uranusjr @potiuk @eladkal ?

@potiuk
Copy link
Member

potiuk commented Sep 29, 2022

Not sure about final solution, but right now we have "common.sql" package that allows much faster iteration on common code used by all sql providers, and it should be possible to add a new feature to the hook (callable post-run??) to the DBApiHook, bumping common.sql to 1.3 and make trino/presto depend on common.sql >=1.3.

That was the main reason for having the common.sql separated to be able to iterate more quickly there and release new features/fixes faster to our users while avoiding relying on bumping Airflow core.

@alexandermalyga
Copy link
Contributor Author

alexandermalyga commented Oct 20, 2022

This was recently fixed on Trino's side: trinodb/trino-python-client#220. I'll open a PR to bump the client's version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants