Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Custom Docker image for Bytewax batch materialization #3099

Merged
merged 1 commit into from
Aug 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion docs/reference/batch-materialization/bytewax.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,20 @@ batch_engine:

The `namespace` configuration directive specifies which Kubernetes [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) jobs, services and configuration maps will be created in.

The `image` parameter specifies which container image to use when running the materialization job. To create a custom image based on this container, please see the [GitHub repository](https://github.com/bytewax/bytewax-feast) for this image.
#### Building a custom Bytewax Docker image

The `image` configuration directive specifies which container image to use when running the materialization job. To create a custom image based on this container, run the following command:

``` shell
DOCKER_BUILDKIT=1 docker build . -f ./sdk/python/feast/infra/materialization/contrib/bytewax/Dockerfile -t <image tag>
```

Once that image is built and pushed to a registry, it can be specified as a part of the batch engine configuration:

``` shell
batch_engine:
type: bytewax
namespace: bytewax
image: <image tag>
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM python:3.9-slim-bullseye AS build

RUN apt-get update && \
apt-get install --no-install-suggests --no-install-recommends --yes git

WORKDIR /bytewax

# Copy dataflow code
COPY sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_dataflow.py /bytewax
COPY sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py /bytewax

# Copy entrypoint
COPY sdk/python/feast/infra/materialization/contrib/bytewax/entrypoint.sh /bytewax

# Copy necessary parts of the Feast codebase
COPY sdk/python sdk/python
COPY protos protos
COPY go go
COPY setup.py setup.py
COPY pyproject.toml pyproject.toml
COPY README.md README.md

# Install Feast for AWS with Bytewax dependencies
# We need this mount thingy because setuptools_scm needs access to the
# git dir to infer the version of feast we're installing.
# https://github.com/pypa/setuptools_scm#usage-from-docker
# I think it also assumes that this dockerfile is being built from the root of the directory.
RUN --mount=source=.git,target=.git,type=bind pip3 install --no-cache-dir -e '.[aws,gcp,bytewax]'

Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
)
from feast.infra.offline_stores.offline_store import OfflineStore
from feast.infra.online_stores.online_store import OnlineStore
from feast.registry import BaseRegistry
from feast.infra.registry.base_registry import BaseRegistry
from feast.repo_config import FeastConfigBaseModel
from feast.stream_feature_view import StreamFeatureView
from feast.utils import _get_column_names
Expand Down Expand Up @@ -341,7 +341,7 @@ def _create_job_definition(self, job_id, namespace, pods, env):
{
"command": ["sh", "-c", "sh ./entrypoint.sh"],
"env": job_env,
"image": "bytewax/bytewax-feast:latest",
"image": self.batch_engine_config.image,
"imagePullPolicy": "Always",
"name": "process",
"ports": [
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import yaml

from feast import FeatureStore, RepoConfig
from feast.infra.materialization.contrib.bytewax.bytewax_materialization_dataflow import (
BytewaxMaterializationDataflow,
)

if __name__ == "__main__":
with open("/var/feast/feature_store.yaml") as f:
feast_config = yaml.safe_load(f)

with open("/var/feast/bytewax_materialization_config.yaml") as b:
bytewax_config = yaml.safe_load(b)

config = RepoConfig(**feast_config)
store = FeatureStore(config=config)

job = BytewaxMaterializationDataflow(
config,
store.get_feature_view(bytewax_config["feature_view"]),
bytewax_config["paths"],
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/sh

cd /bytewax
python dataflow.py