dagster-io · clairelin135 · Oct 19, 2021 · Oct 18, 2021 · Oct 18, 2021 · Oct 18, 2021
@@ -57,8 +57,8 @@
         "title": "Advanced Tutorials",
         "children": [
           {
-            "title": "Configuring Solids",
-            "path": "/tutorial/intro-tutorial/configuring-solids"
+            "title": "Configuring Ops",
+            "path": "/tutorial/advanced-tutorial/configuring-ops"
           },
           {
             "title": "Dagster Types",

@@ -20,7 +20,7 @@ The tutorial is divided into several sections:
 
 These sections will introduce some advanced features and give you deeper insight into Dagster.
 
-- [**Configuring Solids**](/tutorial/intro-tutorial/configuring-solids) shows how to parameterize pipelines with configuration.
+- [**Configuring Solids**](/tutorial/advanced-tutorial/configuring-ops) shows how to parameterize pipelines with configuration.
 - [**Dagster Types**](/tutorial/advanced-tutorial/types) covers Dagster's type system.
 - [**Modes, Resources, & Presets**](/tutorial/advanced-tutorial/pipelines) demonstrates how to hold your business logic constant while interacting with external services in different environments.
 - [**Materializations**](/tutorial/advanced-tutorial/materializations) demonstrates a way to make Dagster aware of your persistent artifacts outside the system.

@@ -0,0 +1,170 @@
+---
+title: Configuring Ops | Dagster
+description: Ops can be parameterized with runtime configuration
+---
+
+# Configuring Ops
+
+<CodeReferenceLink filePath="examples/docs_snippets_crag/docs_snippets_crag/intro_tutorial/advanced/configuring_ops/" />
+
+In many situations, we'd like to be able to configure ops at run time. For example, we may want whomever is running the job to decide what dataset it operates on. Configuration is useful when you want the person or system that is launching a job to be able to make choices about what the job does, without needing to modify the job definition.
+
+We've seen an op whose behavior is the same every time it runs:
+
+```python file=/intro_tutorial/advanced/configuring_ops/download_csv.py startafter=start_download_cereals_marker endbefore=end_download_cereals_marker
+@op
+def download_cereals():
+    response = requests.get("https://docs.dagster.io/assets/cereal.csv")
+    lines = response.text.split("\n")
+    return [row for row in csv.DictReader(lines)]
+```
+
+If we want the URL to be determined by whomever is launching the job, we might write a more generic version:
+
+```python file=/intro_tutorial/advanced/configuring_ops/download_csv.py startafter=start_download_csv_marker endbefore=end_download_csv_marker
+@op
+def download_csv(context):
+    response = requests.get(context.op_config["url"])
+    lines = response.text.split("\n")
+    return [row for row in csv.DictReader(lines)]
+```
+
+Here, rather than hard-coding the value of the URL we want to download from, we use a config option, `url`.
+
+Let's rebuild a job we've seen before, but this time using our newly parameterized op.
+
+```python file=/intro_tutorial/advanced/configuring_ops/configurable_job.py startafter=start_job_marker endbefore=end_job_marker
+import csv
+
+import requests
+from dagster import job, op
+
+
+@op
+def download_csv(context):
+    response = requests.get(context.op_config["url"])
+    lines = response.text.split("\n")
+    return [row for row in csv.DictReader(lines)]
+
+
+@op
+def sort_by_calories(context, cereals):
+    sorted_cereals = sorted(cereals, key=lambda cereal: int(cereal["calories"]))
+
+    context.log.info(f'Most caloric cereal: {sorted_cereals[-1]["name"]}')
+
+
+@job
+def configurable_job():
+    sort_by_calories(download_csv())
+```
+
+<br />
+
+## Specifying Config for Job Execution
+
+We can specify config for a job execution regardless of how we execute the job — the Dagit UI, Python API, or the command line:
+
+### Dagit Config Editor
+
+Dagit provides a powerful, schema-aware, typeahead-enabled config editor to enable rapid experimentation with and debugging of parameterized job executions. As always, run:
+
+```bash
+dagit -f configurable_job.py
+```
+
+Select the "Playground" tab to get to the place for entering config and launching jobs. Here's the config we need to execute our job:
+
+```yaml file=/intro_tutorial/advanced/configuring_ops/run_config.yaml
+ops:
+  download_csv:
+    config:
+      url: "https://raw.githubusercontent.com/dagster-io/dagster/master/examples/docs_snippets_crag/docs_snippets_crag/intro_tutorial/cereal.csv"
+```
+
+Let's enter it into the config editor:
+
+<Image
+alt="config_filled_out.png"
+src="/images/tutorial/job_config_filled_out.png"
+width={2756}
+height={2098}
+/>
+
+### Config in Python API
+
+We previously encountered the <PyObject
+module="dagster" object="JobDefinition" method="execute_in_process" /> function. Job run config is specified by the second argument to this function, which must be a dict.
+
+This dict contains all of the user-provided configuration with which to execute a job. As such, it can have [a lot of sections](/\_apidocs/execution), but we'll only use one of them here: per-op configuration, which is specified under the key `ops`:
+
+```python file=/intro_tutorial/advanced/configuring_ops/configurable_job.py startafter=start_run_config_marker endbefore=end_run_config_marker
+run_config = {
+        "ops": {"download_csv": {"config": {"url": "https://docs.dagster.io/assets/cereal.csv"}}}
+    }
+```
+
+The `ops` dict is keyed by op name, and each op is configured by a dict that may itself have several sections. In this case, we are interested in the `config` section.
+
+Now you can pass this run config to <PyObject
+module="dagster" object="JobDefinition" method="execute_in_process" />:
+
+```python file=/intro_tutorial/advanced/configuring_ops/configurable_job.py startafter=start_execute_marker endbefore=end_execute_marker
+result = configurable_job.execute_in_process(run_config=run_config)
+```
+
+### YAML fragments and Dagster CLI
+
+When executing jobs with the Dagster CLI, we'll need to provide the run config in a file. We use YAML for the file-based representation of configuration, but the values are the same as before:
+
+```yaml file=/intro_tutorial/advanced/configuring_ops/run_config.yaml
+ops:
+  download_csv:
+    config:
+      url: "https://raw.githubusercontent.com/dagster-io/dagster/master/examples/docs_snippets_crag/docs_snippets_crag/intro_tutorial/cereal.csv"
+```
+
+We can pass config files in this format to the Dagster CLI tool with the `-c` flag.
+
+```bash
+dagster job execute -f configurable_job.py -c run_config.yaml
+```
+
+In practice, you might have different sections of your run config in different yaml files—if, for instance, some sections change more often (e.g. in test and prod) while other are more static. In this case, you can set multiple instances of the `-c` flag on CLI invocations, and the CLI tools will assemble the YAML fragments into a single run config.
+
+## Putting Schema on a Config
+
+The implementation of the `download_csv` op we wrote above expects the provided configuration will look a particular way. I.e. that it will include a key called "url" and an accompanying value that is a string. If someone running the job were to neglect to provide config, the op would fail with a `KeyError` when it tried to fetch the value for "url" from `context.op_config`. If the provided value were an integer and not a string, the op would also fail, because `requests.get` expects a string, not an integer.
+
+Dagster allows specifying a "config schema" on any configurable object to help catch these kinds of errors early. Here's a version of `download_csv` that includes a `config_schema`. With this version, if we try to pass config that doesn't match the expected schema, Dagster can tell us immediately.
+
+```python file=/intro_tutorial/advanced/configuring_ops/config_schema.py
+import csv
+
+import requests
+from dagster import op
+
+
+@op(config_schema={"url": str})
+def download_csv(context):
+    response = requests.get(context.op_config["url"])
+    lines = response.text.split("\n")
+    return [row for row in csv.DictReader(lines)]
+```
+
+The <PyObject module="dagster" object="ConfigSchema" /> documentation describes all the ways that you can provide config schema, including optional config fields, default values, and collections.
+
+In addition to catching errors early, config schemas are also useful for documenting jobs and learning how to use them. This is particularly useful when launching jobs from inside Dagit.
+
+Because of the config schema we've provided, Dagit knows that this job requires configuration in order to run without errors.
+
+Go back to the Playground and press <kbd>Ctrl</kbd> + <kbd>Space</kbd> in order to bring up the typeahead assistant.
+
+<Image
+alt="job_inputs_figure_two.png"
+src="/images/tutorial/job_inputs_figure_two.png"
+width={2756}
+height={2098}
+/>
+
+Here you can see all of the valid config fields for your job.
diff --git a/docs/next/public/images/tutorial/job_config_filled_out.png b/docs/next/public/images/tutorial/job_config_filled_out.png
diff --git a/docs/next/public/images/tutorial/job_inputs_figure_two.png b/docs/next/public/images/tutorial/job_inputs_figure_two.png
diff --git a/...l/advanced/configuring_solids/__init__.py → ...rial/advanced/configuring_ops/__init__.py b/...l/advanced/configuring_solids/__init__.py → ...rial/advanced/configuring_ops/__init__.py
diff --git a/...anced/configuring_solids/config_schema.py → ...advanced/configuring_ops/config_schema.py b/...anced/configuring_solids/config_schema.py → ...advanced/configuring_ops/config_schema.py
@@ -1,11 +1,11 @@
 import csv
 
 import requests
-from dagster import solid
+from dagster import op
 
 
-@solid(config_schema={"url": str})
+@op(config_schema={"url": str})
 def download_csv(context):
-    response = requests.get(context.solid_config["url"])
+    response = requests.get(context.op_config["url"])
     lines = response.text.split("\n")
     return [row for row in csv.DictReader(lines)]