Skip to content

Commit

Permalink
Update CWL Toil to support SoftwareRequirement resolution and BioCont…
Browse files Browse the repository at this point in the history
…ainers.

This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456.

This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).

Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool...

```
git clone https://github.com/common-workflow-language/cwltool.git
cd cwltool
```

From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows:

```
hints:
  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93
```

We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command:

```
cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows:

```
cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

The tool should now be successful.

The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers.

In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently.

Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).

```
cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
  • Loading branch information
jmchilton committed Jul 8, 2017
1 parent 7b1f22e commit 265a1f5
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 4 deletions.
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,9 @@ def runSetup():
'gcs_oauth2_boto_plugin==1.9',
botoRequirement],
'cwl': [
'cwltool==1.0.20170413194156',
'schema-salad==2.5.20170328195758',
'cwltool==1.0.20170707200431',
'schema-salad==2.6.20170630075932',
'galaxy-lib==17.9.3',
'cwltest>=1.0.20170214185319']},
package_dir={'': 'src'},
packages=find_packages(where='src',
Expand Down
37 changes: 35 additions & 2 deletions src/toil/cwl/cwltoil.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import cwltool.stdfsaccess
from cwltool.pathmapper import adjustFiles
from cwltool.process import shortname, adjustFilesWithSecondary, fillInDefaults, compute_checksums
from cwltool.software_requirements import DependenciesConfiguration, get_container_from_software_requirements
from cwltool.utils import aslist
import schema_salad.validate as validate
import schema_salad.ref_resolver
Expand Down Expand Up @@ -657,6 +658,14 @@ def main(args=None, stdout=sys.stdout):
metavar=("VAR1 VAR2"),
default=("PATH",),
dest="preserve_environment")
# help="Dependency resolver configuration file describing how to adapt 'SoftwareRequirement' packages to current system."
parser.add_argument("--beta-dependency-resolvers-configuration", default=None)
# help="Defaut root directory used by dependency resolvers configuration."
parser.add_argument("--beta-dependencies-directory", default=None)
# help="Use biocontainers for tools without an explicitly annotated Docker container."
parser.add_argument("--beta-use-biocontainers", default=None, action="store_true")
# help="Short cut to use Conda to resolve 'SoftwareRequirement' packages."
parser.add_argument("--beta-conda-dependencies", default=None, action="store_true")

# mkdtemp actually creates the directory, but
# toil requires that the directory not exist,
Expand All @@ -677,9 +686,22 @@ def main(args=None, stdout=sys.stdout):
cwllogger.setLevel(options.logLevel)

useStrict = not options.not_strict

conf_file = getattr(options, "beta_dependency_resolvers_configuration", None) # Text
use_conda_dependencies = getattr(options, "beta_conda_dependencies", None) # Text

make_tool_kwds = {}

if conf_file or use_conda_dependencies:
dependencies_configuration = DependenciesConfiguration(options) # type: DependenciesConfiguration
make_tool_kwds["job_script_provider"] = dependencies_configuration

options.default_container = None
make_tool_kwds["find_default_container"] = functools.partial(find_default_container, options)

try:
t = cwltool.load_tool.load_tool(options.cwltool, cwltool.workflow.defaultMakeTool,
resolver=cwltool.resolver.tool_resolver, strict=useStrict)
resolver=cwltool.resolver.tool_resolver, strict=useStrict, kwargs=make_tool_kwds)
unsupportedRequirementsCheck(t.requirements)
except cwltool.process.UnsupportedRequirement as e:
logging.error(e)
Expand Down Expand Up @@ -740,7 +762,8 @@ def importDefault(tool):
outobj = toil.restart()
else:
basedir = os.path.dirname(os.path.abspath(options.cwljob or options.cwltool))
builder = t._init_job(job, basedir=basedir, use_container=use_container)
builder = t._init_job(job, basedir=basedir, use_container=use_container,
job_script_provider=make_tool_kwds.get("job_script_provider", None))
(wf1, wf2) = makeJob(t, {}, use_container=use_container,
preserve_environment=options.preserve_environment,
tmpdir=os.path.realpath(outdir), builder=builder)
Expand Down Expand Up @@ -775,3 +798,13 @@ def importDefault(tool):
stdout.write(json.dumps(outobj, indent=4))

return 0


def find_default_container(args, builder):
default_container = None
if args.default_container:
default_container = args.default_container
elif args.beta_use_biocontainers:
default_container = get_container_from_software_requirements(args, builder)

return default_container

0 comments on commit 265a1f5

Please sign in to comment.