Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-wandb] Integration with Weights & Biases #10470

Merged
merged 24 commits into from
Feb 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ celerybeat-schedule

# dotenv
.env
.envrc

# virtualenv
.venv
Expand Down
4 changes: 3 additions & 1 deletion docs/content/_apidocs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ Dagster also provides a growing set of optional add-on libraries to integrate wi

- [GitHub](/\_apidocs/libraries/dagster-github) (`dagster-github`) Provides a resource for issuing GitHub GraphQL queries and filing GitHub issues from Dagster jobs.

- [GraphQL](/\_apidocs/libraries/dagster-graphql) (`dagster-graphql`) Provides resources for interfacing with a Dagster deployment over GraphQL.

- [Kubernetes](/\_apidocs/libraries/dagster-k8s) (`dagster-k8s`) Provides components for deploying Dagster to Kubernetes.

- [Microsoft Teams](/\_apidocs/libraries/dagster-msteams) (`dagster-msteams`) Includes a simple integration with Microsoft Teams.
Expand Down Expand Up @@ -126,4 +128,4 @@ Dagster also provides a growing set of optional add-on libraries to integrate wi

- [Twilio](/\_apidocs/libraries/dagster-twilio) (`dagster-twilio`) Provides a resource for posting SMS messages from ops via Twilio.

- [GraphQL](/\_apidocs/libraries/dagster-graphql) (`dagster-graphql`) Provides resources for interfacing with a Dagster deployment over GraphQL.
- [Weights & Biases](/\_apidocs/libraries/dagster-wandb) (`dagster-wandb`) Provides an integration with Weights & Biases (W\&B).
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -978,6 +978,10 @@
{
"title": "Twilio (dagster-twilio)",
"path": "/_apidocs/libraries/dagster-twilio"
},
{
"title": "Weights & Biases (dagster-wandb)",
"path": "/_apidocs/libraries/dagster-wandb"
}
]
}
Expand Down
2 changes: 1 addition & 1 deletion docs/content/api/modules.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/content/api/searchindex.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/content/api/sections.json

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions docs/content/integrations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -227,4 +227,8 @@ Explore integration libraries that are maintained by the Dagster community.
title="Noteable"
href="https://papermill-origami.readthedocs.io/en/latest/reference/noteable_dagstermill/assets/"
></ArticleListItem>
<ArticleListItem
title="Weights & Biases (W&B)"
href="/_apidocs/libraries/dagster-wandb"
></ArticleListItem>
</ArticleList>
Binary file modified docs/next/public/objects.inv
Binary file not shown.
2 changes: 2 additions & 0 deletions docs/sphinx/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
"../../python_modules/libraries/dagster-duckdb",
"../../python_modules/libraries/dagster-duckdb-pandas",
"../../python_modules/libraries/dagster-duckdb-pyspark",
"../../python_modules/libraries/dagster-wandb",
### autodoc_dagster extension
"./_ext",
]
Expand Down Expand Up @@ -149,6 +150,7 @@
"sshtunnel",
"toposort",
"twilio",
"wandb",
]

autodoc_typehints = "none"
Expand Down
1 change: 1 addition & 0 deletions docs/sphinx/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,4 @@
sections/api/apidocs/libraries/dagster-twilio
sections/api/apidocs/libraries/dagstermill
sections/api/apidocs/libraries/dagster-graphql
sections/api/apidocs/libraries/dagster-wandb
62 changes: 62 additions & 0 deletions docs/sphinx/sections/api/apidocs/libraries/dagster-wandb.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
################################
Weights & Biases (dagster-wandb)
################################

This library provides a Dagster integration with `Weights & Biases <https://wandb.ai/>`_.

Use Dagster and Weights & Biases (W&B) to orchestrate your MLOps pipelines and maintain ML assets.

----

The integration with W&B makes it easy within Dagster to:

* use and create `W&B Artifacts <https://docs.wandb.ai/guides/artifacts>`_.
* use and create Registered Models in the `W&B Model Registry <https://docs.wandb.ai/guides/models>`_.
* run training jobs on dedicated compute using `W&B Launch <https://docs.wandb.ai/guides/launch>`_.
* use the `wandb <https://github.com/wandb/wandb>`_ client in ops and assets.

************
Useful links
************

For a complete set of documentation, see `Dagster integration <https://docs.wandb.ai/guides/integrations/dagster>`_ on the W&B website.

For full-code examples, see `examples/with_wandb <https://github.com/dagster-io/dagster/tree/master/examples/with_wandb>`_ in the Dagster's Github repo.

.. currentmodule:: dagster_wandb

********
Resource
********

.. autoconfigurable:: wandb_resource
:annotation: ResourceDefinition

***********
I/O Manager
***********

.. autoconfigurable:: wandb_artifacts_io_manager
:annotation: IOManager

Config
======

.. autoclass:: WandbArtifactConfiguration
:members:

.. autoclass:: SerializationModule
:members:

Errors
======

.. autoexception:: WandbArtifactsIOManagerError

***
Ops
***

.. autofunction:: run_launch_agent

.. autofunction:: run_launch_job
1 change: 1 addition & 0 deletions docs/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ deps =
-e ../python_modules/libraries/dagster-ssh
-e ../python_modules/libraries/dagster-duckdb
-e ../python_modules/libraries/dagster-dbt
-e ../python_modules/libraries/dagster-wandb

commands =
make --directory=sphinx clean
Expand Down
35 changes: 35 additions & 0 deletions examples/with_wandb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Using Dagster with Weights & Biases Examples

This directory contains examples showcasing the integration with Weights & Biases (W&B).

For a complete set of documentation, see [Dagster integration](https://docs.wandb.ai/guides/integrations/dagster) on the W&B website.

## Getting started

Bootstrap your own Dagster project with this example:

```bash
dagster project from-example --name my-dagster-project --example with_wandb
```

To install this example and its Python dependencies, run:

```bash
pip install -e ".[dev]"
```

Once you've done this, you can run:

```
dagit
```

## Set up wandb

To communicate with the W&B servers you need an API Key.

1. [Log in](https://wandb.ai/login) to W&B. Note: if you are using W&B Server ask your admin for the host name.
2. Collect you API Key by navigating to the [authorize page](https://wandb.ai/authorize) or in your user settings.
3. Set an environment variable with that API key `export WANDB_API_KEY=YOUR_KEY`.
4. Set an environment variable with your W&B entity `export WANDB_ENTITY=john_doe`.
5. Set an environment variable with your W&B project `export WANDB_PROJECT=my_project`.
3 changes: 3 additions & 0 deletions examples/with_wandb/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
2 changes: 2 additions & 0 deletions examples/with_wandb/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[metadata]
name = with_wandb
16 changes: 16 additions & 0 deletions examples/with_wandb/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from setuptools import find_packages, setup

setup(
name="with_wandb",
packages=find_packages(exclude=["with_wandb_tests"]),
install_requires=[
"dagster",
"dagster-wandb",
"onnxruntime",
"skl2onnx",
"joblib",
"torch",
"torchvision",
],
extras_require={"dev": ["dagit", "pytest"]},
)
27 changes: 27 additions & 0 deletions examples/with_wandb/tox.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[tox]
envlist = py{38, 37},pylint,mypy

[testenv]
usedevelop = true
setenv =
VIRTUALENV_PIP=22.1.2
passenv = CI_* COVERALLS_REPO_TOKEN BUILDKITE
deps =
-e ../../python_modules/dagster[mypy,test]
-e ../../python_modules/dagit
-e ../../python_modules/dagster-graphql
-e ../../python_modules/libraries/dagster-wandb
-e .
allowlist_externals =
/bin/bash
commands =
!windows: /bin/bash -c '! pip list --exclude-editable | grep -e dagster -e dagit'
pytest -vv

[testenv:mypy]
commands =
mypy --config=../../pyproject.toml --non-interactive --install-types {posargs} .

[testenv:pylint]
commands =
pylint -j0 --rcfile=../../pyproject.toml {posargs} with_wandb with_wandb_tests
Empty file.
Empty file.
149 changes: 149 additions & 0 deletions examples/with_wandb/with_wandb/assets/advanced_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import wandb
from dagster import AssetIn, OpExecutionContext, asset
from dagster_wandb import WandbArtifactConfiguration

wandb_artifact_configuration: WandbArtifactConfiguration = {
"description": "My **Markdown** description",
"aliases": ["first_alias", "second_alias"],
"add_dirs": [
{
"name": "My model directory",
"local_path": "with_wandb/assets/example",
}
],
"add_files": [
{
"name": "my_training_script",
"local_path": "with_wandb/assets/example/train.py",
},
{
"is_tmp": True,
"local_path": "with_wandb/assets/example/README.md",
},
],
}

MY_ASSET = "my_advanced_artifact"
MY_TABLE = "my_table"


@asset(
name=MY_ASSET,
compute_kind="wandb",
metadata={"wandb_artifact_configuration": wandb_artifact_configuration},
)
def write_advanced_artifact() -> wandb.wandb_sdk.wandb_artifacts.Artifact:
"""Example that writes an advanced Artifact.

Here we use the full power of the integration with W&B Artifacts.

We create a custom Artifact that contains a W&B Table. You could also return the Table directly
but for advanced scenario you will want to create an Artifact directly.

We use the integration to augment that Artifact. This includes:
- Adding a Markdown description
- Tagging the Artifact with two aliases
- We are also attaching a folder and a file. That part is a purposely contrived to
show the capabilities of the integration.

This is all done through the metadata on the asset.

The properties you can pass to 'add_dirs', 'add_files', 'add_references' are the same as the
homonymous method's in the SDK.

Reference:
- https://docs.wandb.ai/ref/python/artifact#add_dir
- https://docs.wandb.ai/ref/python/artifact#add_file
- https://docs.wandb.ai/ref/python/artifact#add_reference

Returns:
wandb.Artifact: The Artifact we augment with the integration
"""
artifact = wandb.Artifact(MY_ASSET, "files")
table = wandb.Table(columns=["a", "b", "c"], data=[[1, 2, 3]])
artifact.add(table, MY_TABLE)
return artifact


@asset(
compute_kind="wandb",
ins={
"table": AssetIn(
asset_key=MY_ASSET,
metadata={
"wandb_artifact_configuration": {
"get": MY_TABLE,
}
},
)
},
output_required=False,
)
def get_table(context: OpExecutionContext, table: wandb.Table) -> None:
"""Example that reads a W&B Table contained in an Artifact.

Args:
context (OpExecutionContext): Dagster execution context
table (wandb.Table): Table contained in our downloaded Artifact

Here, we use the integration to read the W&B Table object created in the previous asset.

The integration downloads the Artifact for us. We can simply annotate our asset and use the
the W&B Table object directly.
"""
context.log.info(f"Result: {table.get_column('a')}") # Result: [1]


@asset(
compute_kind="wandb",
ins={
"path": AssetIn(
asset_key=MY_ASSET,
metadata={
"wandb_artifact_configuration": {
"get_path": "my_training_script",
}
},
)
},
output_required=False,
)
def get_path(context: OpExecutionContext, path: str) -> None:
"""Example that gets the local path of a file contained in an Artifact.

Args:
context (OpExecutionContext): Dagster execution context
path (str): Path in the local filesystem of the downloaded file

Here, we use the integration to collect the local of the file added through the 'add_dirs' in
the metadata of the first asset.

The integration downloads the file for us. We can use that file as any other file.
"""
context.log.info(f"Result: {path}")


@asset(
compute_kind="wandb",
ins={
"artifact": AssetIn(
asset_key=MY_ASSET,
)
},
output_required=False,
)
def get_artifact(
context: OpExecutionContext, artifact: wandb.wandb_sdk.wandb_artifacts.Artifact
) -> None:
"""Example that gets the entire Artifact object.

Args:
context (OpExecutionContext): Dagster execution context
artifact (wandb.wandb_sdk.wandb_artifacts.Artifact): Downloaded Artifact object

Here, we use the integration to collect the entire W&B Artifact object created from in first
asset.

The integration downloads the entire Artifact for us.
"""
context.log.info(f"Result: {artifact.name}") # Result: my_advanced_artifact:v0
Loading