-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Standard SQL in BigQuery Sensor #13750
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Can you provide an example of the actual usage? What sesnors are you talking about? |
Isn't SqlSeneor compatible with BigQuery? |
@eladkal standard SqlSensor is not compatible with BigQuery due to by default it is using legacy sql which does not support many of the standard sql such as (querying partition tables etc..) There is no direct parameter like "use_legacy_sql" = False available that could overwrite the setting to enforce to use standard sql (BigQueryOperator has this functionality). |
@mik-laj Yeah, you have to do something like this:
|
I struggled with this a while ago, I ended up creating a custom sensor specifically using the BigQuery hook that allows me to specify the parameters, very similar to the code that @omarismail94 shared. The "easy" option would be to change the default value of the BigQueryHook for If a change like that isn't viable, perhaps we could look into extending the behaviour of the Connection.get_hook() method, which is called in the SqlSensor to get the hook? Right now it just returns a hook with the default params, perhaps this could be more flexible and allow the instantiation of the SqlSensor to specify custom parameter which is then transferred onto the returned hook. e.g. my_sensor = SqlSensor(
...
hook_parameters = Dict(use_legacy_sql = False),
...
) |
In cae someone comes around looking for a solution for this. I solved it with this quick fix class BigQuerySqlSensor(SqlSensor):
""" Overwrites the use_legacy_sql when using SqlSensor with a BigQuery connection"""
def _get_hook(self):
hook = super()._get_hook()
hook.use_legacy_sql = False
hook.location = Variable.get('location')
return hook
sense_data = BigQuerySqlSensor(
task_id="sense_data",
conn_id="google_default",
sql="select count(*) > 0 from `my_dataset.my_view`",
) Apart from this quick fix I see the point on defaulting to the "legacy" dialect to keep backwards compatibility. I see two solutions for this issue:
Both of them can be implemented. Any thoughts on what could be preferable? |
We should probably support passing arguments from a sensor to the underlying hook. Something like this? sense_data = SqlSensor(
task_id="sense_data",
conn_id="google_default",
hook_kwargs={"use_legacy_sql": False},
sql="select count(*) > 0 from `my_dataset.my_view`",
) |
I too quite like the hook_kwargs option, as it would solve other db hook default issues too, not just BigQuery one. |
Ok, that's another option. It was suggested here with an example piece of code #17315 # airflow/sensors/sql.py
class SqlSensor(BaseSensorOperator):
def __init__(
self, *, conn_id, sql, hook_kwargs: Dict, parameters=None, success=None, failure=None, fail_on_empty=False, **kwargs
):
self.conn_id = conn_id
# init all the params...
self.hook_kwargs = hook_kwargs or {}
super().__init__(**kwargs)
def _get_hook(self):
conn = BaseHook.get_connection(self.conn_id)
# ...
return conn.get_hook(**self.hook_config) And in the connection # airflow/models/connection.py
class Connection(Base, LoggingMixin):
# ...
def get_hook(self, **kwargs):
"""Return hook based on conn_type."""
# locate hook class ...
return hook_class(**{conn_id_param: self.conn_id}, **kwargs) Then we can implement the code @uranusjr provides sense_data = SqlSensor(
task_id="sense_data",
conn_id="google_default",
hook_kwargs={"use_legacy_sql": False},
sql="select count(*) > 0 from `my_dataset.my_view`",
) |
Anyone fancy a pull request? Sounds not too complicated to me! |
WIP, have something, just struggling with the tests. I'll open it so we can review "collectively" |
Description
A sql sensor which uses Standard SQL due to default one uses legacy sql
Use case / motivation
Currently (correct me if I am wrong!), the sql sensor only supports legacy sql. If I want to poke a BQ table, I do not think I can do that using standard sql right now.
Are you willing to submit a PR?
If community approves of this idea, sure!
The text was updated successfully, but these errors were encountered: