Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reorganize sqlite3 user guide example #300

Merged
merged 11 commits into from
Jul 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions cookbook/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,9 +238,8 @@ def __call__(self, filename):
"../deployment/cluster",
# "../deployment/guides", # TODO: add content to this section
# "../control_plane", # TODO: add content to this section
# "../integrations/flytekit_plugins/sqllite3", # TODO: add content to this section
"../integrations/flytekit_plugins/sql",
"../integrations/flytekit_plugins/papermilltasks",
# "../integrations/flytekit_plugins/sqlalchemy", # TODO: add content to this section
"../integrations/flytekit_plugins/pandera",
"../integrations/flytekit_plugins/dolt",
"../integrations/kubernetes/pod",
Expand All @@ -267,9 +266,8 @@ def __call__(self, filename):
"auto/deployment/cluster",
# "auto/deployment/guides", # TODO: add content to this section
# "auto/control_plane", # TODO: add content to this section
# "auto/integrations/flytekit_plugins/sqllite3", # TODO: add content to this section
"auto/integrations/flytekit_plugins/sql",
"auto/integrations/flytekit_plugins/papermilltasks",
# "auto/integrations/flytekit_plugins/sqlalchemy", # TODO: add content to this section
"auto/integrations/flytekit_plugins/pandera",
"auto/integrations/flytekit_plugins/dolt",
"auto/integrations/kubernetes/pod",
Expand Down
21 changes: 18 additions & 3 deletions cookbook/docs/flytekit_plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ You can find the plugins maintained by the core flyte team `here <https://github
.. panels::
:header: text-center

.. link-button:: auto/integrations/flytekit_plugins/sql/index
:type: ref
:text: SQL
:classes: btn-block stretched-link
^^^^^^^^^^^^
Execute SQL queries as tasks.

---

.. link-button:: auto/integrations/flytekit_plugins/papermilltasks/index
:type: ref
:text: Papermill
Expand All @@ -34,16 +43,22 @@ You can find the plugins maintained by the core flyte team `here <https://github
^^^^^^^^^^^^
Validate pandas dataframes with ``pandera``.

---

.. link-button:: auto/integrations/flytekit_plugins/dolt/index
:type: ref
:text: Dolt
:classes: btn-block stretched-link
^^^^^^^^^^^^
Version your SQL database with ``dolt``.

.. TODO: add the following items to the TOC when the content is written.
.. - auto/integrations/flytekit_plugins/sqllite3/index
.. - auto/integrations/flytekit_plugins/sqlalchemy/index

.. toctree::
:maxdepth: -1
:caption: Contents
:hidden:

auto/integrations/flytekit_plugins/sql/index
auto/integrations/flytekit_plugins/papermilltasks/index
auto/integrations/flytekit_plugins/pandera/index
auto/integrations/flytekit_plugins/dolt/index
31 changes: 31 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM python:3.8-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY sql/requirements.txt /root/.
RUN pip install -r /root/requirements.txt

# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY sql/sandbox.config /root

# Copy the actual code
COPY sql/ /root/sql/

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
3 changes: 3 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
PREFIX=sql
include ../../../common/Makefile
include ../../../common/leaf.mk
7 changes: 7 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
###
SQL
###

Flyte tasks are not always restricted to running user-supplied containers, nor even containers at all. Indeed, this is
one of the most important design decisions in Flyte. Non-container tasks can have arbitrary targets for execution --
an API that executes SQL queries like SnowFlake, BigQuery, a synchronous WebAPI, etc.
Empty file.
2 changes: 2 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
-r ../../../common/requirements-common.in
flytekitplugins-sqlalchemy>=0.20.1
146 changes: 146 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
#
# /Library/Developer/CommandLineTools/usr/bin/make requirements.txt
#
attrs==21.2.0
# via scantree
certifi==2021.5.30
# via requests
charset-normalizer==2.0.2
# via requests
click==7.1.2
# via flytekit
croniter==1.0.15
# via flytekit
cycler==0.10.0
# via matplotlib
dataclasses-json==0.5.4
# via flytekit
decorator==5.0.9
# via retry
deprecated==1.2.12
# via flytekit
dirhash==0.2.1
# via flytekit
docker-image-py==0.1.10
# via flytekit
flyteidl==0.19.13
# via flytekit
flytekit==0.20.1
# via
# -r ../../../common/requirements-common.in
# flytekitplugins-sqlalchemy
flytekitplugins-sqlalchemy==0.20.1
# via -r requirements.in
greenlet==1.1.0
# via sqlalchemy
grpcio==1.38.1
# via flytekit
idna==3.2
# via requests
importlib-metadata==4.6.1
# via keyring
keyring==23.0.1
# via flytekit
kiwisolver==1.3.1
# via matplotlib
marshmallow==3.12.2
# via
# dataclasses-json
# marshmallow-enum
# marshmallow-jsonschema
marshmallow-enum==1.5.1
# via dataclasses-json
marshmallow-jsonschema==0.12.0
# via flytekit
matplotlib==3.4.2
# via -r ../../../common/requirements-common.in
mypy-extensions==0.4.3
# via typing-inspect
natsort==7.1.1
# via flytekit
numpy==1.21.0
# via
# matplotlib
# pandas
# pyarrow
pandas==1.3.0
# via flytekit
pathspec==0.8.1
# via scantree
pillow==8.3.1
# via matplotlib
protobuf==3.17.3
# via
# flyteidl
# flytekit
py==1.10.0
# via retry
pyarrow==3.0.0
# via flytekit
pyparsing==2.4.7
# via matplotlib
python-dateutil==2.8.1
# via
# croniter
# flytekit
# matplotlib
# pandas
python-json-logger==2.0.1
# via flytekit
pytimeparse==1.1.8
# via flytekit
pytz==2018.4
# via
# flytekit
# pandas
regex==2021.7.6
# via docker-image-py
requests==2.26.0
# via
# flytekit
# responses
responses==0.13.3
# via flytekit
retry==0.9.2
# via flytekit
scantree==0.0.1
# via dirhash
six==1.16.0
# via
# cycler
# flytekit
# grpcio
# protobuf
# python-dateutil
# responses
# scantree
sortedcontainers==2.4.0
# via flytekit
sqlalchemy==1.4.21
# via flytekitplugins-sqlalchemy
statsd==3.3.0
# via flytekit
stringcase==1.2.0
# via dataclasses-json
typing-extensions==3.10.0.0
# via typing-inspect
typing-inspect==0.7.1
# via dataclasses-json
urllib3==1.26.6
# via
# flytekit
# requests
# responses
wheel==0.36.2
# via
# -r ../../../common/requirements-common.in
# flytekit
wrapt==1.12.1
# via
# deprecated
# flytekit
zipp==3.5.0
# via importlib-metadata
3 changes: 3 additions & 0 deletions cookbook/integrations/flytekit_plugins/sql/sandbox.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[sdk]
workflow_packages=sql
python_venv=flytekit_venv
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""
SQLAlchemy
----------

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

That being said, Flyte provides an easy-to-use interface to utilize SQLAlchemy to connect to various SQL Databases.

The SQLAlchemy task will run with a pre-built container, and thus users needn't build one.
"""

# %%
# Let's import the libraries.
import pandas
from flytekit import kwtypes, task, workflow
from flytekitplugins.sqlalchemy import SQLAlchemyConfig, SQLAlchemyTask


# %%
# We define an SQLAlchemyTask to fetch limited records from a table. Finally, we return the length of the returned DataFrame.
#
# .. note::
#
# The output of SQLAlchemyTask is a :py:class:`~flytekit.types.schema.FlyteSchema` by default.
@task
def get_length(df: pandas.DataFrame) -> int:
return len(df)


sql_task = SQLAlchemyTask(
name="sqlalchemy_task",
query_template="select * from <table> limit {{.inputs.limit}}",
inputs=kwtypes(limit=int),
task_config=SQLAlchemyConfig(uri="<uri>"),
)


@workflow
def my_wf(limit: int) -> int:
return get_length(df=sql_task(limit=limit))


if __name__ == "__main__":
print(f"Running {__file__} main...")
print(my_wf(limit=3))
Loading