-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Update data platform blue print with Dataflow Flex template #1105
[Feature] Update data platform blue print with Dataflow Flex template #1105
Conversation
blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/README.md
Show resolved
Hide resolved
blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/README.md
Outdated
Show resolved
Hide resolved
blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/README.md
Show resolved
Hide resolved
blueprints/data-solutions/data-platform-foundations/demo/datapipeline.py
Outdated
Show resolved
Hide resolved
blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags.py
Outdated
Show resolved
Hide resolved
Great addition!
I would mention those benefits in the Demo README as a suggested way to handle Data pipeline lifecycle. |
…d-foundation-fabric into feature/dp-df-templates
@lcaggio Comments have been addressed, merged updates with master, tests passing. Looking forward to hearing your feedback. Thanks! |
Thanks @aymanfarhat for the work on this PR! Just few comment on demo gcloud commands to explicitly impersonate SA created and we are good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great addition to the blueprint!
This PR implements the creation of a Dataflow Flex template along with it cloudbuild configuration. The PR extends on the same use case as the original example where the implementation of the Dataflow pipeline template is in the demos folder along with its build requirements.
The Terraform code is also updated to reflect the needed infrastructure elements to support the DF template build such as new GCS buckets for template, staging and an artifact registry resource. In addition the outputs updated to reflect the changes and suggest a build command to launch the cloud build pipeline for the Dataflow pipeline.
The Airflow pipelines are also updated to integrate with the new Dataflow Flex template, using the relevant operator in this case (DataflowStartFlexTemplateOperator). In addition, I added some improvements on the Python Airflow code in terms of small refactorings here and there + updating the code style to conform to the Google Python style guide via yapf. The Python dataflow code follows the same approach in terms of code style.
Looking forward to hearing your feedback. Thanks! :)