-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing example DAGs/system tests for Google services #8280
Comments
I would be happy to give it a try. I've only worked with airflow and AWS so far but I'm sure I should be able to come up with at least some examples. |
@SlowSnowFox I'm glad you want to work on it. Have you encountered any difficulties? Can I help you? |
Hey, Sorry for the late reply. I'll be able to make a pull request for some of them on the weekend. Is it ok if I just package all examples I've made in a pull request and list the specific tests in the commit message? |
Hey. |
Happy to try the gcs_to_bigquery and bigquery_to_gcs examples if there is not being worked on yet. |
Go ahead. Get started. I look forward to your example. I would be happy if you also added a system test, because it will allow us to check the example more easily. |
@mik-laj great! correct me if I'm wrong, but it seems there is already a gcs_to_bigquery example over here. https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/example_dags/example_gcs_to_bq.py |
Fantastic. It is now enough to change the file name and add a system test. airflow/providers/google/cloud/example_dags/example_gcs_to_bq.py This is the main requirement for now. |
There's an example DAG and system tests for |
@ephraimbuddy Would you like to split this file? Each module should have a separate sample DAG and a separate system test. This is important because of code maintenance. When everything adheres to one principle, it is easier to make changes automatically. |
Ok. I'll be separating them into different files. Thanks for explanation |
I saw there's many PR that have been merged. Is this issue already closed then? |
@irvifa Some examples are still missing. I updated the first post. |
Previously there's already example of how to run export from CloudSQL to GCS described in https://airflow.readthedocs.io/en/stable/_modules/airflow/contrib/example_dags/example_gcp_sql.html. However, based on apache#8280 the test itself is not available yet.
We have a lot of DAG examples, but sometimes these DAGs don't contain all operators. More information in the first post. |
Happy to work on the documentation/example dags for the Data Loss Prevention operator :) |
@rachael-ds I assigned you to this ticket 🐈 |
@mik-laj I am a newbie here and I am trying to add mssql_to_gcs example dag. GCP is also new to me and I have created an account and enabled service account. I have to add/edit the connection details in GCP connection to check if my example dag is working. I have created an example dag with mssql already. When configuring connections for GCP, i had to add keyfile path, keyfile secret name, keyfile json. But i am not sure how to get these values. Is there any docs that helps me to understand this part? Thanks. |
Here are the relevant GCP instructions @Bowrna https://cloud.google.com/docs/authentication/production |
@potiuk I have followed the instructions and enabled the service account. I have the JSON key after enabling the service account. But I am not sure what values I have to give in the keyfile path, keyfile secret name, keyfile json in the Google Cloud connections form. I only have the JSON generated out by enabling service account. |
In Breeze you can put the files in "files" dir and it will be visible inside as "/files/*" and then in the connection you should specify path to that file :). I think you can specify either Json orh "Keyfile + Secret" - you do not have to specify all three. I think this page has good explanation of what is in the key. You can also - as exercise look at the unit tests of GcpBaseHook - it should have tests for all the different authentication options and should show you which combinations are valid. |
thanks @potiuk. Checking the unit test would be great way to understand this configuration. |
Closed in favor of AIP 47 |
Description
Hello,
We have a rule that every GCP operators should have an example DAG and system test. This is true in many cases, but there are minor exceptions.
https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L155-L162
We also lack examples for individual operators.
https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L164-L235
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueDeleteOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksQueueResumeOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePauseOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePurgeOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksTaskGetOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksTasksListOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksTaskDeleteOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksQueueGetOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksQueueUpdateOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.tasks.CloudTasksQueuesListOperator
(Add more operators to example DAG for Cloud Tasks #13235)airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateInlineWorkflowTemplateOperator
airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateWorkflowTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPGetStoredInfoTypeOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPReidentifyContentOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDeidentifyTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDLPJobOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateDeidentifyTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPDeidentifyContentOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobTriggerOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPListDeidentifyTemplatesOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDeidentifyTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPListInspectTemplatesOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPListStoredInfoTypesOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateInspectTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDLPJobOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPListJobTriggersOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPCancelDLPJobOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPGetInspectTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPListInfoTypesOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDeidentifyTemplateOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPListDLPJobsOperator
airflow.providers.google.cloud.operators.dlp.CloudDLPRedactImageOperator
airflow.providers.google.cloud.operators.datastore.CloudDatastoreDeleteOperationOperator
airflow.providers.google.cloud.operators.datastore.CloudDatastoreGetOperationOperator
airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor
airflow.providers.google.cloud.sensors.gcs.GCSObjectUpdateSensor
airflow.providers.google.cloud.sensors.gcs.GCSObjectsWtihPrefixExistenceSensor
airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor
If you decide to finish this ticket you don't have to do all the work yourself. One PR can only deal with a single operator and it's ok.
These example DAGs are key to ensuring high-quality integration.
If you haven't used the GCP yet, after creating the account you will get $300, which will allow you to get to know these services better.
The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing that is required by the community. If anyone is interested in this task, I am willing to provide all the necessary tips and information.
Are you wondering how to start contributing to this project? Start by reading our contributor guide
Related Issues
N/A
The text was updated successfully, but these errors were encountered: