-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support interactions with Airflow #836
Comments
With Airflow 2 launching, you might want to build this on the new Airflow API. The plan is to get Airflow 2 out in early December. |
@ryw are there new updates regarding the integration with airflow? A couple of weeks ago airflow 2 was released and seems to be pretty stable 🙂 |
@pualien how would you see the AF2 integration working? Could you run me through a scenario? We are currently thinking about the design and the more usecases we have the better! |
@michel-tricot as you reported before..being more specific the best could be to have airbyte operators and sensors in airflow, in order to trigger connector execution and check the end of the sync in airflow |
Airbyte Jobs or syncs created through Airbyte UI could also be translated to Airflow DAGs dynamically? |
https://www.astronomer.io/blog/airflow-dbt-1 another blog by the astronomer. The second part is very interesting about how they build dbt tasks within Airflow DAG using a With the Airbyte API it is now possible to build an Airflow Operator and the connector. Can I help with that? |
Hi @marcosmarxm that would be amazing!! How do you envision it working? |
@michel-tricot I imagine that this integration should take place in stages. I made a draft and the code is very primitive: The use case I considered in this draft is (in the code I used the Money/JSON destination example from the getting started):
I created in Airflow an operator called
from airflow.providers.airbyte import AirbyteTriggerSyncOperator # in the future! now is plugin.operators :p
with DAG(dag_id='trigger_airbyte_connection',
default_args={'owner': 'airflow'},
schedule_interval='@daily',
start_date=days_ago(2)) as dag:
money_json = AirbyteTriggerSyncOperator(
task_id='sync_money_json',
airbyte_conn_id='airbyte_local',
source_name='Money',
dest_name='JSON destination'
) as I mentioned, it's a draft. However it's quite simple and is already "working". |
Looking at the Airflow documentation, it would be interesting to create a Hook in addition to the Operator. Inserting all the API access methods into it. And if there are more Operators in the future, they would be able to reuse the methods from the Hook. |
Looks great! Out of curiosity, why did you decide to configure the SyncOperator with the source name and destination name, instead of the connection id? Also, where do you envision the operator to live? Do you think it should be a separate project or would it make sense to have it in the monorepo? |
Using Source and Destination as parameters airflow_task = AirbyteTriggerSyncOperator(
task_id='sync_money_json',
airbyte_conn_id='airbyte_conn_example',
source_name='Money',
dest_name='JSON destination'
) Using ConnectionId directly airflow_task = AirbyteTriggerSyncOperator(
task_id='sync_money_json',
airbyte_conn_id='airbyte_conn_example',
connection_id='5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6',
) I used the name of the PROS: CONS: The |
Sources (and destinations) can have duplicate names, so it's probably safer to refer to just the Developers could still use a descriptive |
@marcosmarxm can we close this? |
I think so, I'll create another issue to expand the first Operator release to support SSH connection also. |
Tell us about the problem you're trying to solve
I would like Airflow to be able to trigger syncs on Airbyte as well as having Airbyte triggering Airflow runs
The text was updated successfully, but these errors were encountered: