-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ClickHouse Provider #25714
Adding ClickHouse Provider #25714
Conversation
from airflow.utils.context import Context | ||
|
||
|
||
class ClickHouseOperator(BaseOperator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClickHouseQueryOperator ?
also it could maybe be base on the https://pypi.org/project/apache-airflow-providers-common-sql/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is similar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pateash can you elaborate on this point?
I think ClickHouseHook
should inherit from DbApiHook
and ClickHouseOperator
should inherit from BaseSQLOperator
Is there a reason why we shouldn't do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @eladkal, @raphaelauv ,
Sorry for the late response, I was OOO.
the reason I didn't used DBApiHook is because it's intended to be used with databases ( mostly transactional supporting sqlalchemy ),
and Clickhouse being a distributed OLAP, i really think that it will be better not to expose methods like insert_rows(), which uses sqlalchemy.
rather if someone wants to have this functionality they should use the underlying library ( clickhouse-driver ) and implement their own operator using the hook and connection object.
# if database is provided use it or use from schema | ||
if self.database: | ||
connection_kwargs.update(database=self.database) | ||
elif conn.schema: | ||
connection_kwargs.update(database=conn.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a source for confusion? Maybe we should customize the connection in the like we do with other providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide more information?
here, if someone wants to override the default schema ( provided by connection ), they can pass the database argument.
|
||
result = hook.query(sql=self.sql, params=self.params) | ||
if self.result_processor: | ||
self.result_processor(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is result_processor
implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result process is a callback which could be passed by the user to proess the result similar to arangodb provider.
from airflow.sensors.sql import SqlSensor | ||
|
||
|
||
class ClickHouseSensor(SqlSensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked it elsewhere but if ClickHouseHook will inhert from DBApiHook then we won't need this custom sensor because SqlSensor will support ClickHouse natively https://github.com/pateash/airflow/blob/f7e2ffe42bb7e957e4e16d0cb65e9541c87bd72b/airflow/providers/common/sql/sensors/sql.py#L80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @eladkal,
Sorry for the late response, I was OOO.
the reason I didn't used DBApiHook is because it's intended to be used with databases ( mostly transactional supporting sqlalchemy ),
and Clickhouse being a OLAP, i really think that it will be better not to expose methods like insert_rows(), which uses sqlalchemy.
rather if someone wants to have this functionality they should use the underlying library ( clickhouse-driver ) and implement their own operator using the hook and connection object.
# Conflicts: # CONTRIBUTING.rst # INSTALL
3e43b1c
to
edb3f77
Compare
Hi @pateash I'm interested in having airflow support ClickHouse, but I don't want to have such a standard implementation.
cursor.executemany('INSERT INTO test (x) VALUES', [[200]])
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
closes: #10893
Description
Adding ClickHouse provider based on its Python SDK https://clickhouse-driver.readthedocs.io/en/latest/
Users can create their own custom operators leveraging the ClickHouseHook directly
or building their operator on ClickHouseOperator by providing result_processor method,
The sensor can be implemented by SQL