Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update data platform blue print with Dataflow Flex template #1105

Merged
merged 34 commits into from
Feb 6, 2023

Conversation

aymanfarhat
Copy link
Member

This PR implements the creation of a Dataflow Flex template along with it cloudbuild configuration. The PR extends on the same use case as the original example where the implementation of the Dataflow pipeline template is in the demos folder along with its build requirements.

The Terraform code is also updated to reflect the needed infrastructure elements to support the DF template build such as new GCS buckets for template, staging and an artifact registry resource. In addition the outputs updated to reflect the changes and suggest a build command to launch the cloud build pipeline for the Dataflow pipeline.

The Airflow pipelines are also updated to integrate with the new Dataflow Flex template, using the relevant operator in this case (DataflowStartFlexTemplateOperator). In addition, I added some improvements on the Python Airflow code in terms of small refactorings here and there + updating the code style to conform to the Google Python style guide via yapf. The Python dataflow code follows the same approach in terms of code style.

Looking forward to hearing your feedback. Thanks! :)

@lcaggio
Copy link
Collaborator

lcaggio commented Jan 23, 2023

Great addition!

  • This works as an example where users can just update the logic in the pipeline and process data they have adding the import/transformation logic they need.

  • Manual steps described in the example can be automated and integrated with a data pipeline CI/CD.

I would mention those benefits in the Demo README as a suggested way to handle Data pipeline lifecycle.

@aymanfarhat
Copy link
Member Author

@lcaggio Comments have been addressed, merged updates with master, tests passing. Looking forward to hearing your feedback. Thanks!

@lcaggio
Copy link
Collaborator

lcaggio commented Jan 30, 2023

Thanks @aymanfarhat for the work on this PR! Just few comment on demo gcloud commands to explicitly impersonate SA created and we are good to go!

Copy link
Collaborator

@lcaggio lcaggio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great addition to the blueprint!

@ludoo ludoo merged commit 02d8d83 into GoogleCloudPlatform:master Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants