Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nested sub-workflows #4477

Closed
wants to merge 1 commit into from
Closed

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Oct 26, 2021

Allow sub-workflow definitions under the ~/cylc-run directory.

This simple change address part of the discussion under #4453. It just changes (and rewords) an exception into a warning, so that cylc install does not balk at installing the sub-workflow at runtime, when its flow.cylc is found under ~/cylc-run.

Context: a sub-workflow definition is just another source flie for the main workflow. As such, it should be installed with other main workflow source files into the main workflow run directory. Then at run-time, the sub-workflow gets "instantiated" anew for each main workflow cycle point (via another cylc install) from the installed sub-workflow source.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.py and
    conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Already covered by existing tests.
  • Does not need tests (why?).
  • Appropriate change log entry included.
  • No change log entry required (why? e.g. invisible to users).
  • (master branch) I have opened a documentation PR at cylc/cylc-doc/pull/XXXX.
  • (7.8.x branch) I have updated the documentation in this PR branch.
  • No documentation update required.

@hjoliver hjoliver self-assigned this Oct 26, 2021
@hjoliver hjoliver added this to the cylc-8.0rc1 milestone Oct 26, 2021
@hjoliver
Copy link
Member Author

hjoliver commented Oct 26, 2021

Here's a working example on this branch.

Main source tree:

$ tree ~/cylc-src/demo    
/home/oliverh/cylc-src/demo
├── flow.cylc
└── sub
    └── flow.cylc

Main workflow definition:

$ cat flow.cylc      
[scheduling]
    cycling mode = integer
    initial cycle point = 1
    [[graph]]
        P1 = "foo => sub => bar"
[runtime]
    [[foo, bar]]
        script = true
    [[sub]]
        script = """
            SUBWF_SRCE="${CYLC_WORKFLOW_RUN_DIR}/sub"
            SUBWF_NAME="${CYLC_WORKFLOW_ID}-sub"
            SUBWF_RUNN=$CYLC_TASK_CYCLE_POINT
            cylc install -C $SUBWF_SRCE --flow-name=$SUBWF_NAME --run-name=$SUBWF_RUNN
            cylc play --no-detach $SUBWF_NAME/$CYLC_TASK_CYCLE_POINT
        """

Sub-workflow definition:

$ cat sub/flow.cylc        
[scheduling]
    [[graph]]
        R1 = "sub1 => sub2 => sub3"
[runtime]
    [[sub1, sub2, sub3]]
        script = sleep 3

This example runs correctly, and the resulting run directory tree looks like this:

/home/oliverh/cylc-run/demo/
├── _cylc-install
│   └── source -> /home/oliverh/cylc-src/demo
├── run1
│   ├── log
│   │   ├── flow-config
│   │   ├── install
│   │   ├── job
│   │   │   ├── 1
│   │   │   │   ├── bar
│   │   │   │   │   ├── 01
│   │   │   │   │   └── NN -> 01
│   │   │   │   ├── foo
│   │   │   │   │   ├── 01
│   │   │   │   │   └── NN -> 01
│   │   │   │   └── sub
│   │   │   │       ├── 01
│   │   │   │       └── NN -> 01
│   │   │   └── 2
│   │   │       ├── bar
│   │   │       │   ├── 01
│   │   │       │   └── NN -> 01
│   │   │       ├── foo
│   │   │       │   ├── 01
│   │   │       │   └── NN -> 01
│   │   │       └── sub
│   │   │           ├── 01
│   │   │           └── NN -> 01
│   │   └── workflow
│   ├── share
│   ├── sub
│   └── work
│       ├── 1
│       └── 2
├── run1-sub  # <---- sub-workflow run directories for main workflow/run1
│   ├── 1  # <---- sub-workflow output for main cycle point 1
│   │   ├── log
│   │   │   ├── flow-config
│   │   │   ├── install
│   │   │   ├── job
│   │   │   │   └── 1
│   │   │   │       ├── sub1
│   │   │   │       │   ├── 01
│   │   │   │       │   └── NN -> 01
│   │   │   │       ├── sub2
│   │   │   │       │   ├── 01
│   │   │   │       │   └── NN -> 01
│   │   │   │       └── sub3
│   │   │   │           ├── 01
│   │   │   │           └── NN -> 01
│   │   │   └── workflow
│   │   ├── share
│   │   └── work
│   │       └── 1
│   ├── 2  # <---- sub-workflow output for main cycle point 2
│   │   ├── log
│   │   │   ├── flow-config
│   │   │   ├── install
│   │   │   ├── job
│   │   │   │   └── 1
│   │   │   │       ├── sub1
│   │   │   │       │   ├── 01
│   │   │   │       │   └── NN -> 01
│   │   │   │       ├── sub2
│   │   │   │       │   ├── 01
│   │   │   │       │   └── NN -> 01
│   │   │   │       └── sub3
│   │   │   │           ├── 01
│   │   │   │           └── NN -> 01
│   │   │   └── workflow
│   │   ├── share
│   │   └── work
│   │       └── 1
│   └── _cylc-install
│       └── source -> /home/oliverh/cylc-run/demo/run1/sub
└── runN -> run1

Note the sub-workflow run-directory should be assocatiated with a particular installed instance of the main workflow (demo/run1 in this case) is ~/cylc-src/demo/run1-sub not ~/demo/run1/sub`, to avoid a nested run directory.

cylc scan sees:

  • only the main source workflow definition under ~/cylc-src ✔️
  • both the main and sub-workflow run directories under ~/cylc-run ✔️

If this PR flies, we should document that this is how to run sub-workflows at Cylc 8, and what the limitations are.

@oliver-sanders
Copy link
Member

This example runs correctly, and the resulting run directory tree looks like this:

/home/oliverh/cylc-run/demo/
├── _cylc-install
│   └── source -> /home/oliverh/cylc-src/demo
├── run1
...
├── run1-sub  # <---- sub-workflow run directories for main workflow/run1
...

@hjoliver unfortunately this breaks the cylc install model as flows from multiple sources are being placed under the dirs controlled by the same _cylc-install dir. This should be prohibited by cylc install (it becomes troublesome for source mapping and reinstallation).

There are two ways around this, the first is to go with this structure (outlined on the linked issue):

~/cylc-run
    my-flow/
        controller/
            _cylc-install/
            run1/
                flow.cylc
        sub-workflow/
            cylc-install/
            2000/
                flow.cylc

Which is nice and tidy, however, you have to remember to add the /controller when you install, it will break at runtime if you don't. On the 8.x timeframe we could consider adding an installation configuration file to handle this?

The second option is to install the sub workflows under different names which is fairly simple:

~/cylc-run/
    my-flow/
        _cylc-install/
        run1/
            flow.cylc
            sub/
                flow.cylc
    my-flow-sub/
        _cylc-install/
        2000/
            flow.cylc
        2001/
            flow.cylc

Here's an example of the latter I tested on this branch:

        script = """    
            cylc install \    
                -C "$CYLC_WORKFLOW_RUN_DIR/sub" \    
                --flow-name "$CYLC_WORKFLOW_NAME-sub" \    
                --run-name "$CYLC_TASK_CYCLE_POINT"    
            cylc play \    
                "$CYLC_WORKFLOW_NAME-sub/$CYLC_TASK_CYCLE_POINT" \    
                --no-detach                                         
        """

@hjoliver
Copy link
Member Author

@hjoliver unfortunately this breaks the cylc install model as flows from multiple sources are being placed under the dirs controlled by the same _cylc-install dir. This should be prohibited by cylc install (it becomes troublesome for source mapping and reinstallation).

Can you be specific about what breaks exactly?

  • the example runs correctly
  • the source links are correct:
$ stat -c%N cylc-run/demo/_cylc-install/source   
‘cylc-run/demo/_cylc-install/source’ -> ‘/home/oliverh/cylc-src/demo’

$ stat -c%N cylc-run/demo/run2-sub/_cylc-install/source 
‘cylc-run/demo/run2-sub/_cylc-install/source’ -> ‘/home/oliverh/cylc-run/demo/run2/sub’
  • reinstall works:
$ cylc reinstall demo/run2
REINSTALLED demo/run2 from /home/oliverh/cylc-src/demo

$ cylc reinstall demo/run2-sub
WARNING - demo/run2-sub source found in /home/oliverh/cylc-run. This is OK for installed sub-workflow definitions.
REINSTALLED demo/run2-sub from /home/oliverh/cylc-run/demo/run2/sub

(That's correct, because the source for ~/cylc-run/demo/run2-sub is in ~/cylc-run/demo/run2)

@hjoliver
Copy link
Member Author

hjoliver commented Oct 26, 2021

Your suggested alternatives don't work as-is, because the sub-workflows of multiple installed main workflows clash. Although they can be tweaked to fix that, of course.

(And they still require the change on this branch - i.e. allow source files in the cylc-run dir.)

@hjoliver
Copy link
Member Author

If we can settle on a good run-dir structure for managing sub-workflows we could provide basic built-in support for it (for install and play). Or at least document it.

Even though I haven't found anything that's actually broken in my original suggested structure, which tries to hide the sub-workflow run dir inside the main one, I concede it might be simpler to keep the two separate with no nesting.

Another idea: mirror the main workflow run-dir path exactly, but under a new sub-directory of cylc-run. This avoids nested installs, and the result is completely unambiguous.

      [[sub]]
        script = """
            SUBWF_FLOW_NAME="sub-workflows/${CYLC_WORKFLOW_ID}"
            SUBWF_RUN_NAME="subrun${CYLC_TASK_CYCLE_POINT}"
            cylc install \
                -C "${CYLC_WORKFLOW_RUN_DIR}/sub" \
                --flow-name "${SUBWF_FLOW_NAME}" \
                --run-name "${SUBWF_RUN_NAME}"
            cylc play --no-detach
                "${SUBWF_FLOW_NAME}/${SUBWF_RUN_NAME}"
        """

Result, for two main-workflow runs:

/home/oliverh/cylc-run
├── demo
│   ├── _cylc-install
│   │   └── source -> /home/oliverh/cylc-src/demo
│   ├── run1
│   ├── run2
│   └── runN -> run2
└── sub-workflows
    └── demo
        ├── run1
        │   ├── _cylc-install
        │   │   └── source -> /home/oliverh/cylc-run/demo/run1/sub
        │   ├── subrun1
        │   └── subrun2
        └── run2
            ├── _cylc-install
            │   └── source -> /home/oliverh/cylc-run/demo/run2/sub
            ├── subrun1
            └── subrun2

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 27, 2021

Ah, I hadn't spotted you had craftily nested a _cylc-install dir in run1-sub ala:

$ tree -L 2 ~/cylc-run/hillary-sub
.
|-- _cylc-install
|   `-- source -> ~/cylc-src/hillary-sub
|-- run1
|   |-- flow.cylc
|   |-- log
|   |-- share
|   |-- sub
|   `-- work
|-- run1-sub  # looks like a run dir managed by its sibling `_cylc-install` but isn't
|   |-- 1
|   |-- 2
|   |-- 3
|   `-- _cylc-install  # can confirm reinstall for run1-sub will use this one (correctly)!
`-- runN -> run1

It's a little confusing and I'm surprised it works, but it does work and work rather nicely, both flows reinstall from the correct source!

@oliver-sanders
Copy link
Member

If we can settle on a good run-dir structure for managing sub-workflows

Agreed, so some thoughts...

Ideally:

  • Drop cylc install for the installation of sub-workflows from installed source.
    • Sub workflows have already been installed from source.
      • For subsequent sub-workflow installs we can symlink rather than copy the files into the sub workflow run dirs.
      • Would require a quick script for listing top level items in the sub-workflow's src dir and creating symlinks in the run dir.
      • Would require a small change to the flow.cylc -> suite.rc back-compat mode detection.
    • Bypasses any install nesting issues.
    • Makes reinstall apply automatically to sub-workflows (otherwise you would have to reinstall the master then all sub workflows manually).
  • Stick the sub flows under the master flow's run dir.
    • Neater & makes it easier to manage them as a block, nicer for cylc clean and cylc stop.
  • One day add a "job context" for sub workflows.
    • Something which does all the cylc install; cylc play stuff for you.

Ideas:

1) Put the sub workflows in the run dir, link rather than copy files:
$ tree ~/cylc-run/idea-one
.
|-- _cylc-install
|   `-- source -> ~/cylc-src/whatever
`-- run1
    |-- flow.cylc
    |-- sub-a
    |   |-- 1
    |   |   `-- flow.cylc -> ../flow.cylc
    |   |-- 2
    |   |   `-- flow.cylc -> ../flow.cylc
    |   `-- flow.cylc
    `-- sub-b
        |-- 20000101T0000Z
        |   `-- flow.cylc -> ../flow.cylc
        |-- 20010101T0000Z
        |   `-- flow.cylc -> ../flow.cylc
        `-- flow.cylc

The IDs become:

  • idea-one/run1 - master flow
  • idea-one/sub-a/1 - sub flow
  • ...

Example management cmds:

$ cylc stop idea-one/run1  # stop master workflow
$ cylc stop 'idea-one/run1/*'  # stop all sub-workflows (and master?)
$ cylc stop idea-one/run1/sub-a/1  # similar to `cylc kill idea-one/run1 sub-a/1`
$ cylc clean idea-one/run1
2) Put the sub workflows in the work dir, link rather than copy files:
$ tree ~/cylc-run/idea-two
.
|-- _cylc-install
|   `-- source -> ~/cylc-src/whatever
`-- run1
    |-- flow.cylc
    |-- sub-a
    |   `-- flow.cylc
    |-- sub-b
    |   `-- flow.cylc
    `-- work
        |-- 20000101T0000Z
        |   |-- sub-a
        |   |   `-- flow.cylc -> ../../sub-a/flow.cylc
        |   `-- sub-b
        |       `-- flow.cylc -> ../../sub-b/flow.cylc
        `-- 20010101T0000Z
            |-- sub-a
            |   `-- flow.cylc -> ../../sub-a/flow.cylc
            `-- sub-b
                `-- flow.cylc -> ../../sub-b/flow.cylc

Housekeeping is now the same for sub-workflows as it is for native tasks which will make configuring cylc clean easier.

The IDs become:

  • idea-one/run1 - master flow
  • idea-one/work/1/sub-a - sub flow
  • ...

Example management commands:

$ cylc clean idea-two/run1
$ cylc clean idea-two/run1/work/1/sub-a
$ cylc clean idea-two/run1 --rm 'work/{cycle - P1D}'  # remove all work stuff from yesterday's cycle
$ cylc stop idea-two/work/*  # stop all sub-workflows (confusing)

@hjoliver
Copy link
Member Author

hjoliver commented Nov 5, 2021

Drop cylc install for the installation of sub-workflows from installed source.

Agreed. Install and reinstall work correctly in my example (given that a sub-workflow instance needs to be based on the installed main workflow source) but there's not much point in explicitly exposing that because we don't want to encourage users to edit sub-workflow sources (or anything else) in the main run directory.

Note from last meeting we agreed not to let this issue hold up 8.0rc1. If it's much work to implement the above we can start by recommending users take the straightforward manual (but ugly) approach of managing subworkflows as entirely separate workflows, which is how it works at Cylc 7 anyway.

@hjoliver hjoliver modified the milestones: cylc-8.0rc1, cylc-8.0.0 Nov 30, 2021
@oliver-sanders oliver-sanders marked this pull request as draft January 17, 2022 11:47
@oliver-sanders oliver-sanders modified the milestones: cylc-8.0.0, cylc-8.x Feb 14, 2022
@hjoliver hjoliver changed the title Allow sub-workflow definitions installed under cylc-run. Allow nested sub-workflows Apr 8, 2022
@hjoliver hjoliver mentioned this pull request Apr 8, 2022
7 tasks
@hjoliver
Copy link
Member Author

hjoliver commented Apr 8, 2022

Superseded by #4811

@hjoliver hjoliver closed this Apr 8, 2022
@hjoliver hjoliver deleted the allow-sub-workflows branch April 10, 2022 22:06
@hjoliver hjoliver modified the milestones: cylc-8.x, cylc-8.0rc3 Apr 11, 2022
@hjoliver hjoliver mentioned this pull request Apr 13, 2022
12 tasks
@hjoliver hjoliver mentioned this pull request May 4, 2022
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants