Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cylc Install, nested source directory #4453

Closed
datamel opened this issue Oct 6, 2021 · 12 comments
Closed

Cylc Install, nested source directory #4453

datamel opened this issue Oct 6, 2021 · 12 comments
Assignees
Milestone

Comments

@datamel
Copy link
Contributor

datamel commented Oct 6, 2021

Currently, you can perform cylc install workflow1 on a source directory with the following structure...

   $ tree ~/cylc-src
   └── workflow1
       ├── workflow2
       │   └── flow.cylc 
       └── flow.cylc

Cylc install does check for nested run directories (by looking for .service directory) but this current scenario will install without error.

Cylc scan, will not currently pick up workflow2 (workflow1 does appear on scan).

Some thought needed into whether we wish to check and forbid a scenario like this (although may be undesirable for performance to nested scan), or perhaps clear documentation on best practice for sub-workflows and/or a tutorial would be an acceptable solution.

@datamel datamel added this to the cylc-8.x milestone Oct 6, 2021
@datamel datamel added the question Flag this as a question for the next Cylc project meeting. label Oct 6, 2021
@hjoliver
Copy link
Member

hjoliver commented Oct 11, 2021

That's actually a convenient source tree structure for sub-workflows.

If workflow1 has a task in it that runs workflow2, it makes sense to install the workflow2 definition into the workflow1 run directory, just like any other source files required by workflow1. Then at runtime,the sub-workflow-runner task would install and run a new copy of workflow2 each time (each cycle point), from the "source" version installed into the workflow1 run directory.

But we don't currently allow that:

$ cylc cat-log -f e workflow1 run-workflow2.1    
WorkflowFilesError: workflow2 installation failed. Source directory should not be in /home/oliverh/cylc-run
2021-10-12T10:35:35+13:00 CRITICAL - failed/ERR

Would there be any dire consequences to dropping this restriction? (cylc scan performance?)

@hjoliver
Copy link
Member

Note a sub-workflow should not be installed from cylc-src before each (sub-workflow) run. The sub-workflow definition should be fossilized at the same version as the installed main workflow, so it makes sense to install it like any other main workflow source file.

@hjoliver
Copy link
Member

(My comments above might be slightly off-piste, if you're only worried about cylc scan seeing these source dirs, or having cylc install balk at them?)

@oliver-sanders
Copy link
Member

Off-piste but

That's actually a convenient source tree structure for sub-workflows.

Kinda but it won't work because Cylc doesn't allow nested run dirs, the sub-flow structure should be:

workflow/
    controller/
        flow.cylc
    sub-workflow-1/
        flow.cylc
    sub-workflow-2/
        flow.cylc

@hjoliver
Copy link
Member

hjoliver commented Oct 20, 2021

Kinda but it won't work because Cylc doesn't allow nested run dirs

Yes, but what we need (for sub-workflows) is a nested source dir tree in the main workflow run directory.

For a sub-workflow that (e.g.) gets spawned in every main workflow cycle point, we need to create a new sub-workflow run directory for each sub-workflow instance. And the source for those instances should be stored (like other main workflow source files) in the main workflow run directory.

the sub-flow structure should be:

That's not ideal. The sub-workflow source needs to get installed to the main-workflow run dir when we do cylc install <main-workflow>

@oliver-sanders
Copy link
Member

Hijacking this as a sub-suite issue (as that seems to be what it boils down to)

FYI: I was outlining the "run dir" in my previous post.

A structure like this would make sense:

# ~/cylc-run/
my-workflow/
    controller/
        flow.cylc
        sub-suite/  # this is an installed src copy - it can't/shouldn't be run directly and will not appear in scan results
            flow.cylc
    sub-suite/
        2000/
            flow.cylc
        2001/
            flow.cylc

The resulting workflow IDs work nicely with the UID:

$ cylc stop 'my-workflow/*'
$ cylc stop 'my-workflow/controller' --kill
$ cylc stop my-workflow/sub-suite/*'
$ cylc stop my-workflow/sub-suite/2000'

The sub-suites would be launched like so:

# my-workflow/controller/flow.cylc
[runtime]
    [[sub-suite]]
        script = """
            # potential for a nicer interface later
            cylc install \
                -C $CYLC_WORKFLOW_RUN_DIR/sub-suite \
                --flow-name my-workflow/sub-suite \
                --run-name $CYLC_TASK_CYCLE_POINT
            cylc play my-workflow/sub-suite/$CYLC_TASK_CYCLE_POINT --no-detach
        """

Problems:

  • When installing from source you must remember to add the /controller bit manually.
  • cylc install will refuse to install the sub-suites due to nested _cylc-install directories.
  • What about run names/numbs?

Will come back to this post 8.0.0, thoughts for now:

  • Is this even the correct way to do sub-suites?
    • For multi-dimensional cycling use cases sub-graphs would be better than sub-suites.
    • For remaining use case composition might by a better way to go.
    • I.E. compose the sub-suite into the parent workflow then have the parent dice it up into sub-suites which it then manages.
    • This allows the parent to see the tasks in the sub-workflow.
    • This allows the graph to be dynamically spliced and diced into sub-suites based on local configuration.
    • Related to configurable task batching.

@hjoliver
Copy link
Member

hjoliver commented Oct 20, 2021

Is this even the correct way to do sub-suites?

Well, I was only talking about what's currently possible. Sub-suites can still be useful now, even though we expect to have better ways of managing the same problems in the future. (Spawn-on-demand has made the "multi-dimensional cycling" case a lot more efficient than it was though, with a parameterized sub-cycle, so less of a need for sub-suites there I guess).

@hjoliver
Copy link
Member

hjoliver commented Oct 26, 2021

Will come back to this post 8.0.0, thoughts for now:

We need workable sub-suites for 8.0, because some users rely on them at Cylc 7. I'll post a simple tweak to allow that.

For the original topic of this Issue:

  • nested source trees are OK
  • its fine for cylc scan to ignore the sub-workflows, as they should be considered part of the parent workflow definition

@hjoliver
Copy link
Member

@datamel and @oliver-sanders - if you agree, we should close this issue and post another one for planning better ways of handling sub-workflows (which may include 8.x improvements on the current model as well the more distant Cylc 9 plans).

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 27, 2021

We need workable sub-suites for 8.0, because some users rely on them at Cylc 7

Aah, we don't have any users at our end (we have advised against it), guessing you've got some use cases at your end in which case this bumps up the priorities a bit.

We may have a conflict here with #4442 - #4396

@hjoliver
Copy link
Member

guessing you've got some use cases at your end in which case this bumps up the priorities a bit.

My bad (or good 😁 ) - I quite like sub-suites as an alternative to parameterized sub-cycling. Much more efficient, under Cylc 7 spawn-on-submit scheduling. I think someone on the forum may have gone that way recently too. So long as the disadvantages are explained (namely, housekeeping, and monitoring and management of multiple workflows).

@hjoliver hjoliver removed the question Flag this as a question for the next Cylc project meeting. label May 24, 2022
@hjoliver hjoliver self-assigned this May 24, 2022
@hjoliver hjoliver modified the milestones: cylc-8.x, cylc-8.0rc3 May 24, 2022
@hjoliver
Copy link
Member

Closed by #4861 ...

... The nested source structure in the description above is the natural way to store sub-workflow source directories, and #4861 makes this work in Cylc 8 by allowing source directories under cylc-run (for on-the-fly installation of sub-workflow instances).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants