Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on YAML features #509

Merged
merged 1 commit into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ re-running queries that have already been executed.
transform_request
datasets
examples
yaml
command_line
contribute
about
Expand Down
46 changes: 46 additions & 0 deletions docs/yaml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# YAML Features

The YAML file format (with our extensions) provides a powerful and compact way of representing ServiceX queries. Two features in particular can be leveraged to make definitions clear and avoid duplication: the use of definitional anchors/aliases, and the inclusion of other YAML files.

## Using anchors and aliases
The standard YAML language has the concepts of _anchors_ and _aliases_. An _anchor_ is a string prefixed by "&" which then refers to the contents of that particular YAML item, and which can be reused later by using an _alias_ which is the same string prefixed with "*" (the syntax is conceptually similar to that of making a pointer and dereferencing it in C/C++). The anchor must be defined before being used as an alias.

One can define an anchor once and reuse it as an alias many times. So, for example, a cut may be defined as `&DEF_mycut <cut definition>` and then referred to in queries as `*DEF_mycut`. By convention (although not strictly required) the anchors are grouped in a `Definitions` block at the start of the YAML file; this avoids clutter in the `General` and `Sample` blocks and allows reordering of those blocks without having to worry about the ordering of anchors and aliases.

A configuration file using this feature would look like

```yaml
Definitions:
- &DEF_query !PythonFunction |
def run_query(input_filenames=None):
return []

Sample:
- Name: mysample
Query: *DEF_query
...
```

## Including other YAML files
We have extended the standard YAML file format to allow the inclusion of the contents of another YAML file, with the `!include` syntax.

For example, we can have a main YAML file like this:
```yaml
Definitions:
!include definitions.yaml

Sample:
- Name: mysample
Query: *DEF_query
...
```
which includes a `definitions.yaml` file that looks like this:
```yaml
- &DEF_query !PythonFunction |
def run_query(input_filenames=None):
return []
```
By factoring the files like this and using anchors and aliases, the top-level file can be kept readable.

## A note on string handling
YAML tries to provide ways to "naturally" embed multiline strings in the configuration files. This can sometimes lead to somewhat unexpected results. We recommend the "block scalar" style, introducing a multiline string by starting with a pipe (|) and using a constant indentation for each line of the string that follows. You might find [this site](https://yaml-multiline.info/) useful to demonstrate the various potential modes.
Loading