forked from flyteorg/flyte
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add bigquery example Signed-off-by: Kevin Su <[email protected]> * Add makefile Signed-off-by: Kevin Su <[email protected]> * Updated example Signed-off-by: Kevin Su <[email protected]> * Updated comment Signed-off-by: Kevin Su <[email protected]> * Updated requirement Signed-off-by: Kevin Su <[email protected]> * Updated dependency Signed-off-by: Kevin Su <[email protected]> * use annotated Signed-off-by: Kevin Su <[email protected]>
- Loading branch information
Showing
10 changed files
with
354 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
include ../common/common.mk | ||
include ../common/parent.mk |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
PREFIX=bigquery | ||
include ../../../common/common.mk | ||
include ../../../common/leaf.mk | ||
|
||
.PHONY: docker_build | ||
docker_build: ; | ||
|
||
.PHONY: docker_push | ||
docker_push: ; | ||
|
||
.PHONY: register | ||
register: ; | ||
|
||
.PHONY: serialize | ||
serialize: ; | ||
|
||
.PHONY: docker_push | ||
docker_push: ; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
BigQuery | ||
==== | ||
|
||
Flyte backend can be connected with BigQuery service. Once enabled it can allow you to query a BigQuery table. | ||
This section will provide how to use the BigQuery Plugin using flytekit python. | ||
|
||
Installation | ||
------------ | ||
|
||
To use the flytekit bigquery plugin simply run the following: | ||
|
||
.. prompt:: bash | ||
|
||
pip install flytekitplugins-bigquery | ||
|
||
No Need of a dockerfile | ||
------------------------ | ||
This plugin is purely a spec. Since SQL is completely portable there is no need to build a Docker container. | ||
|
||
|
||
Configuring the backend to get bigquery working | ||
--------------------------------------------- | ||
- BigQuery plugins are `enabled in flytepropeller's config <https://docs.flyte.org/en/latest/deployment/plugin_setup/gcp/bigquery.html#deployment-plugin-setup-gcp-bigquery>`_ |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
""" | ||
BigQuery Query | ||
############ | ||
This example shows how to use a Flyte BigQueryTask to execute a query. | ||
""" | ||
try: | ||
from typing import Annotated | ||
except ImportError: | ||
from typing_extensions import Annotated | ||
|
||
import pandas as pd | ||
from flytekit import StructuredDataset, kwtypes, task, workflow | ||
from flytekitplugins.bigquery import BigQueryConfig, BigQueryTask | ||
|
||
# %% | ||
# This is the world's simplest query. Note that in order for registration to work properly, you'll need to give your | ||
# BigQuery task a name that's unique across your project/domain for your Flyte installation. | ||
bigquery_task_no_io = BigQueryTask( | ||
name="sql.bigquery.no_io", | ||
inputs={}, | ||
query_template="SELECT 1", | ||
task_config=BigQueryConfig(ProjectID="flyte", Location="us-west1-b"), | ||
) | ||
|
||
|
||
@workflow | ||
def no_io_wf(): | ||
return bigquery_task_no_io() | ||
|
||
|
||
# %% | ||
# Of course, in real world applications we are usually more interested in using BigQuery to query a dataset. | ||
# In this case we use crypto_dogecoin data which is public dataset in BigQuery. | ||
# `here <https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&d=crypto_dogecoin&p=bigquery-public-data&t=transactions>`__ | ||
# | ||
# Let's look out how we can parameterize our query to filter results for a specific transaction version, provided as a user input | ||
# specifying a version. | ||
|
||
DogeCoinDataset = Annotated[ | ||
StructuredDataset, kwtypes(hash=str, size=int, block_number=int) | ||
] | ||
|
||
bigquery_task_templatized_query = BigQueryTask( | ||
name="sql.bigquery.w_io", | ||
# Define inputs as well as their types that can be used to customize the query. | ||
inputs=kwtypes(version=int), | ||
output_structured_dataset_type=DogeCoinDataset, | ||
task_config=BigQueryConfig(ProjectID="flyte", Location="us-west1-b"), | ||
query_template="SELECT * FROM `bigquery-public-data.crypto_dogecoin.transactions` WHERE @version = 1 LIMIT 10;", | ||
) | ||
|
||
|
||
# %% | ||
# StructuredDataset transformer can convert query result to pandas dataframe here. | ||
# We can also change "pandas.dataframe" to "pyarrow.Table", and convert result to Arrow table. | ||
@task | ||
def convert_bq_table_to_pandas_dataframe(sd: DogeCoinDataset) -> pd.DataFrame: | ||
return sd.open(pd.DataFrame).all() | ||
|
||
|
||
@workflow | ||
def full_bigquery_wf(version: int) -> pd.DataFrame: | ||
sd = bigquery_task_templatized_query(version=version) | ||
return convert_bq_table_to_pandas_dataframe(sd=sd) | ||
|
||
|
||
# %% | ||
# Check query result on bigquery console: ``https://console.cloud.google.com/bigquery`` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
-r ../../../common/requirements-common.in | ||
flytekitplugins-bigquery==0.30.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,217 @@ | ||
# | ||
# This file is autogenerated by pip-compile with python 3.9 | ||
# To update, run: | ||
# | ||
# pip-compile requirements.in | ||
# | ||
arrow==1.2.1 | ||
# via jinja2-time | ||
binaryornot==0.4.4 | ||
# via cookiecutter | ||
cachetools==5.0.0 | ||
# via google-auth | ||
certifi==2021.10.8 | ||
# via requests | ||
chardet==4.0.0 | ||
# via binaryornot | ||
charset-normalizer==2.0.6 | ||
# via requests | ||
checksumdir==1.2.0 | ||
# via flytekit | ||
click==7.1.2 | ||
# via | ||
# cookiecutter | ||
# flytekit | ||
cloudpickle==2.0.0 | ||
# via flytekit | ||
cookiecutter==1.7.3 | ||
# via flytekit | ||
croniter==1.0.15 | ||
# via flytekit | ||
cycler==0.10.0 | ||
# via matplotlib | ||
dataclasses-json==0.5.6 | ||
# via flytekit | ||
decorator==5.1.0 | ||
# via retry | ||
deprecated==1.2.13 | ||
# via flytekit | ||
diskcache==5.2.1 | ||
# via flytekit | ||
docker-image-py==0.1.12 | ||
# via flytekit | ||
docstring-parser==0.11 | ||
# via flytekit | ||
flyteidl==0.22.0 | ||
# via flytekit | ||
flytekit==0.30.1 | ||
# via | ||
# -r ../../../common/requirements-common.in | ||
# flytekitplugins-bigquery | ||
flytekitplugins-bigquery==0.30.1 | ||
# via -r requirements.in | ||
google-api-core[grpc]==2.5.0 | ||
# via | ||
# google-cloud-bigquery | ||
# google-cloud-core | ||
google-auth==2.6.0 | ||
# via | ||
# google-api-core | ||
# google-cloud-core | ||
google-cloud-bigquery==2.32.0 | ||
# via flytekitplugins-bigquery | ||
google-cloud-core==2.2.2 | ||
# via google-cloud-bigquery | ||
google-crc32c==1.3.0 | ||
# via google-resumable-media | ||
google-resumable-media==2.2.1 | ||
# via google-cloud-bigquery | ||
googleapis-common-protos==1.54.0 | ||
# via | ||
# google-api-core | ||
# grpcio-status | ||
grpcio==1.43.0 | ||
# via | ||
# flytekit | ||
# google-api-core | ||
# google-cloud-bigquery | ||
# grpcio-status | ||
grpcio-status==1.43.0 | ||
# via google-api-core | ||
idna==3.2 | ||
# via requests | ||
importlib-metadata==4.8.1 | ||
# via keyring | ||
jinja2==3.0.3 | ||
# via | ||
# cookiecutter | ||
# jinja2-time | ||
jinja2-time==0.2.0 | ||
# via cookiecutter | ||
keyring==23.2.1 | ||
# via flytekit | ||
kiwisolver==1.3.2 | ||
# via matplotlib | ||
markupsafe==2.0.1 | ||
# via jinja2 | ||
marshmallow==3.13.0 | ||
# via | ||
# dataclasses-json | ||
# marshmallow-enum | ||
# marshmallow-jsonschema | ||
marshmallow-enum==1.5.1 | ||
# via dataclasses-json | ||
marshmallow-jsonschema==0.12.0 | ||
# via flytekit | ||
matplotlib==3.4.3 | ||
# via -r ../../../common/requirements-common.in | ||
mypy-extensions==0.4.3 | ||
# via typing-inspect | ||
natsort==7.1.1 | ||
# via flytekit | ||
numpy==1.21.2 | ||
# via | ||
# matplotlib | ||
# pandas | ||
# pyarrow | ||
packaging==21.3 | ||
# via google-cloud-bigquery | ||
pandas==1.3.3 | ||
# via flytekit | ||
pillow==8.3.2 | ||
# via matplotlib | ||
poyo==0.5.0 | ||
# via cookiecutter | ||
proto-plus==1.20.0 | ||
# via google-cloud-bigquery | ||
protobuf==3.19.4 | ||
# via | ||
# flyteidl | ||
# flytekit | ||
# google-api-core | ||
# google-cloud-bigquery | ||
# googleapis-common-protos | ||
# grpcio-status | ||
# proto-plus | ||
py==1.10.0 | ||
# via retry | ||
pyarrow==6.0.1 | ||
# via flytekit | ||
pyasn1==0.4.8 | ||
# via | ||
# pyasn1-modules | ||
# rsa | ||
pyasn1-modules==0.2.8 | ||
# via google-auth | ||
pyparsing==2.4.7 | ||
# via | ||
# matplotlib | ||
# packaging | ||
python-dateutil==2.8.1 | ||
# via | ||
# arrow | ||
# croniter | ||
# flytekit | ||
# google-cloud-bigquery | ||
# matplotlib | ||
# pandas | ||
python-json-logger==2.0.2 | ||
# via flytekit | ||
python-slugify==5.0.2 | ||
# via cookiecutter | ||
pytimeparse==1.1.8 | ||
# via flytekit | ||
pytz==2018.4 | ||
# via | ||
# flytekit | ||
# pandas | ||
regex==2021.10.8 | ||
# via docker-image-py | ||
requests==2.26.0 | ||
# via | ||
# cookiecutter | ||
# flytekit | ||
# google-api-core | ||
# google-cloud-bigquery | ||
# responses | ||
responses==0.14.0 | ||
# via flytekit | ||
retry==0.9.2 | ||
# via flytekit | ||
rsa==4.8 | ||
# via google-auth | ||
six==1.16.0 | ||
# via | ||
# cookiecutter | ||
# cycler | ||
# google-auth | ||
# grpcio | ||
# python-dateutil | ||
# responses | ||
sortedcontainers==2.4.0 | ||
# via flytekit | ||
statsd==3.3.0 | ||
# via flytekit | ||
text-unidecode==1.3 | ||
# via python-slugify | ||
typing-extensions==3.10.0.2 | ||
# via | ||
# flytekit | ||
# typing-inspect | ||
typing-inspect==0.7.1 | ||
# via dataclasses-json | ||
urllib3==1.26.7 | ||
# via | ||
# flytekit | ||
# requests | ||
# responses | ||
wheel==0.37.0 | ||
# via | ||
# -r ../../../common/requirements-common.in | ||
# flytekit | ||
wrapt==1.13.1 | ||
# via | ||
# deprecated | ||
# flytekit | ||
zipp==3.6.0 | ||
# via importlib-metadata |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
[sdk] | ||
workflow_packages=bigquery | ||
python_venv=flytekit_venv | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
SERIALIZED_PB_OUTPUT_DIR := /tmp/output | ||
|
||
.PHONY: clean | ||
clean: | ||
rm -rf $(SERIALIZED_PB_OUTPUT_DIR)/* | ||
|
||
$(SERIALIZED_PB_OUTPUT_DIR): clean | ||
mkdir -p $(SERIALIZED_PB_OUTPUT_DIR) | ||
|
||
.PHONY: serialize | ||
serialize: $(SERIALIZED_PB_OUTPUT_DIR) | ||
pyflyte --config /root/sandbox.config serialize workflows -f $(SERIALIZED_PB_OUTPUT_DIR) | ||
|
||
.PHONY: fast_serialize | ||
fast_serialize: $(SERIALIZED_PB_OUTPUT_DIR) | ||
pyflyte --config /root/sandbox.config serialize fast workflows -f $(SERIALIZED_PB_OUTPUT_DIR) |