Skip to content

Commit

Permalink
Beta support for configurable dependency resolution & Biocontainers.
Browse files Browse the repository at this point in the history
Consider the included tool ``seqtk_seq.cwl``. It includes the following SoftwareRequirement hint:

```
hints:
  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93
```

I'm not happy that ``version`` is a list - but I can live with it for now I guess.

If cwltool is executed with the hidden ``--beta-conda-dependencies`` flag, this requirement will be processed by galaxy-lib, Conda will be installed, and seqtk will be installed, and a Conda environment including seqtk will be setup for the job.

```
virtualenv .venv
. .venv/bin/activate
python setup.py install
pip install galaxy-lib
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

Additional flags are available to configure dependency resolution in a more fine grained way - using Conda however has a number of advantages that make it particularily well suited to CWL. Conda packages are distributed as binaries that work across Mac and Linux and work on relatively old version of Linux (great for HPC). Conda also doesn't require root and supports installation of multiple different versions of a package - again these factors make it great for HPC and non-Docker targets.

The Biocontainers project (previously Biodocker) dovetails nicely with this. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io (assembled by @bgruening and colleagues). There are over 1800 such containers currently.

Continuing with the example above, the new ``--beta-use-biocontainers`` flag instructs cwltool to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).

```
cwltool --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across CWL, Galaxy, and CLI - both inside and outside of Docker.

My sincerest hope is that we move away from CWL-specific Dockerfiles. For less effort, a community bioconda package can be made and the result can be used in many more contexts. The Docker image will then be maintained by the community Biocontainer project.

Rebased with correct spelling of DependenciesConfiguration thanks to @tetron.
  • Loading branch information
jmchilton committed Oct 24, 2016
1 parent 5c8ee81 commit 1625611
Show file tree
Hide file tree
Showing 28 changed files with 954 additions and 4 deletions.
1 change: 1 addition & 0 deletions cwltool/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ def __init__(self): # type: () -> None
self.stagedir = None # type: Text
self.make_fs_access = None # type: Type[StdFsAccess]
self.build_job_script = None # type: Callable[[List[str]], Text]
self.find_default_container = None # type: Callable[[], Text]

def bind_input(self, schema, datum, lead_pos=[], tail_pos=[]):
# type: (Dict[Text, Any], Any, Union[int, List[int]], List[int]) -> List[Dict[Text, Any]]
Expand Down
11 changes: 8 additions & 3 deletions cwltool/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@

PYTHON_RUN_SCRIPT = """
import json
import os
import sys
import subprocess
Expand All @@ -42,6 +43,7 @@
commands = popen_description["commands"]
cwd = popen_description["cwd"]
env = popen_description["env"]
env["PATH"] = os.environ.get("PATH")
stdin_path = popen_description["stdin_path"]
stdout_path = popen_description["stdout_path"]
stderr_path = popen_description["stderr_path"]
Expand Down Expand Up @@ -140,9 +142,12 @@ def run(self, dry_run=False, pull_image=True, rm_container=True,
if docker_req and kwargs.get("use_container") is not False:
env = os.environ
img_id = docker.get_from_requirements(docker_req, docker_is_req, pull_image)
elif kwargs.get("default_container", None) is not None:
env = os.environ
img_id = kwargs.get("default_container")
if img_id is None:
find_default_container = self.builder.find_default_container
default_container = find_default_container and find_default_container()
if default_container:
img_id = default_container
env = os.environ

if docker_is_req and img_id is None:
raise WorkflowException("Docker is required for running this tool.")
Expand Down
126 changes: 125 additions & 1 deletion cwltool/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import hashlib
import pkg_resources # part of setuptools
import functools
import string

import rdflib
from typing import (Union, Any, AnyStr, cast, Callable, Dict, Sequence, Text,
Expand All @@ -31,6 +32,13 @@
from .builder import adjustFileObjs, adjustDirObjs
from .stdfsaccess import StdFsAccess
from .pack import pack
from .utils import get_feature
try:
from galaxy.tools.deps.requirements import ToolRequirement
from galaxy.tools import deps
except ImportError:
ToolRequirement = None # type: ignore
deps = None

_logger = logging.getLogger("cwltool")

Expand Down Expand Up @@ -144,6 +152,15 @@ def arg_parser(): # type: () -> argparse.ArgumentParser
exgroup.add_argument("--quiet", action="store_true", help="Only print warnings and errors.")
exgroup.add_argument("--debug", action="store_true", help="Print even more logging")

# help="Dependency resolver configuration file describing how to adapt 'SoftwareRequirement' packages to current system."
parser.add_argument("--beta-dependency-resolvers-configuration", default=None, help=argparse.SUPPRESS)
# help="Defaut root directory used by dependency resolvers configuration."
parser.add_argument("--beta-dependencies-directory", default=None, help=argparse.SUPPRESS)
# help="Use biocontainers for tools without an explicitly annotated Docker container."
parser.add_argument("--beta-use-biocontainers", default=None, help=argparse.SUPPRESS, action="store_true")
# help="Short cut to use Conda to resolve 'SoftwareRequirement' packages."
parser.add_argument("--beta-conda-dependencies", default=None, help=argparse.SUPPRESS, action="store_true")

parser.add_argument("--tool-help", action="store_true", help="Print command line help for tool")

parser.add_argument("--relative-deps", choices=['primary', 'cwd'],
Expand Down Expand Up @@ -634,8 +651,20 @@ def main(argsl=None,
stdout.write(json.dumps(processobj, indent=4))
return 0

conf_file = getattr(args, "beta_dependency_resolvers_configuration", None) # Text
use_conda_dependencies = getattr(args, "beta_conda_dependencies", None) # Text

make_tool_kwds = vars(args)

build_job_script = None # type: Callable[[Any, List[str]], Text]
if conf_file or use_conda_dependencies:
dependencies_configuration = DependenciesConfiguration(args) # type: DependenciesConfiguration
make_tool_kwds["build_job_script"] = dependencies_configuration.build_job_script

make_tool_kwds["find_default_container"] = functools.partial(find_default_container, args)

tool = make_tool(document_loader, avsc_names, metadata, uri,
makeTool, vars(args))
makeTool, make_tool_kwds)

if args.print_rdf:
printrdf(tool, document_loader.ctx, args.rdf_serializer, stdout)
Expand Down Expand Up @@ -748,5 +777,100 @@ def locToPath(p):
_logger.removeHandler(stderr_handler)
_logger.addHandler(defaultStreamHandler)


COMMAND_WITH_DEPENDENCIES_TEMPLATE = string.Template("""#!/bin/bash
$handle_dependencies
python "run_job.py" "job.json"
""")


def find_default_container(args, builder):
if args.default_container:
return args.default_container
elif args.beta_use_biocontainers:
try:
from galaxy.tools.deps.containers import ContainerRegistry, AppInfo, ToolInfo, DOCKER_CONTAINER_TYPE
except ImportError:
raise Exception("galaxy-lib not found")

app_info = AppInfo(
involucro_auto_init=True,
enable_beta_mulled_containers=True,
) # type: AppInfo
container_registry = ContainerRegistry(app_info) # type: ContainerRegistry
requirements = _get_dependencies(builder)
tool_info = ToolInfo(requirements=requirements) # type: ToolInfo
container_description = container_registry.find_best_container_description([DOCKER_CONTAINER_TYPE], tool_info)
print container_description
if container_description:
return container_description.identifier

return None


class DependenciesConfiguration(object):

def __init__(self, args):
# type: (argparse.Namespace) -> None
conf_file = getattr(args, "beta_dependency_resolvers_configuration", None)
tool_dependency_dir = getattr(args, "beta_dependencies_directory", None)
conda_dependencies = getattr(args, "beta_conda_dependencies", None)
if conf_file is not None and os.path.exists(conf_file):
self.use_tool_dependencies = True
if not tool_dependency_dir:
tool_dependency_dir = os.path.abspath(os.path.dirname(conf_file))
self.tool_dependency_dir = tool_dependency_dir
self.dependency_resolvers_config_file = conf_file
elif conda_dependencies:
if not tool_dependency_dir:
tool_dependency_dir = os.path.abspath("./cwltool_deps")
self.tool_dependency_dir = tool_dependency_dir
self.use_tool_dependencies = True
self.dependency_resolvers_config_file = None
else:
self.use_tool_dependencies = False

@property
def config_dict(self):
return {
'conda_auto_install': True,
'conda_auto_init': True,
}

def build_job_script(self, builder, command):
# type: (Any, List[str]) -> Text
if deps is None:
raise Exception("galaxy-lib not found")
tool_dependency_manager = deps.build_dependency_manager(self) # type: deps.DependencyManager
dependencies = _get_dependencies(builder)
handle_dependencies = "" # str
if dependencies:
handle_dependencies = "\n".join(tool_dependency_manager.dependency_shell_commands(dependencies, job_directory=builder.tmpdir))

template_kwds = dict(handle_dependencies=handle_dependencies) # type: Dict[str, str]
job_script = COMMAND_WITH_DEPENDENCIES_TEMPLATE.substitute(template_kwds)
return job_script


def _get_dependencies(builder):
# type: (Any) -> List[ToolRequirement]
(software_requirement, _) = get_feature(builder, "SoftwareRequirement")
dependencies = [] # type: List[ToolRequirement]
if software_requirement and software_requirement.get("packages"):
packages = software_requirement.get("packages")
for package in packages:
version = package.get("version", None)
if isinstance(version, list):
if version:
version = version[0]
else:
version = None
dependencies.append(ToolRequirement.from_dict(dict(
name=package["package"].split("#")[-1],
version=version,
type="package",
)))
return dependencies

if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
12 changes: 12 additions & 0 deletions cwltool/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -502,6 +502,18 @@ def _init_job(self, joborder, **kwargs):

builder.resources = self.evalResources(builder, kwargs)

build_job_script = kwargs.get("build_job_script", None) # type: Callable[[Builder, List[str]], Text]
curried_build_job_script = None # type: Callable[[List[str]], Text]
if build_job_script:
curried_build_job_script = lambda commands: build_job_script(builder, commands)
builder.build_job_script = curried_build_job_script

find_default_container = kwargs.get("find_default_container", None) # type: Callable[[Builder], Text]
curried_find_default_container = None # type: Callable[[], Text]
if find_default_container:
curried_find_default_container = lambda: find_default_container(builder)
builder.find_default_container = curried_find_default_container

return builder

def evalResources(self, builder, kwargs):
Expand Down
11 changes: 11 additions & 0 deletions tests/2.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
>Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;
gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt
tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac
cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg
aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg
gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt
tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac
ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt
ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt
tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa
aacaattcta atggtcaaaa a
12 changes: 12 additions & 0 deletions tests/2.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
@EAS54_6_R1_2_1_413_324
CCCTTCTTGTCTTCAGCGTTTCTCC
+
;;3;;;;;;;;;;;;7;;;;;;;88
@EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
+
;;;;;;;;;;;7;;;;;-;;;3;83
@EAS54_6_R1_2_1_443_348
GTTGCTTCTGGCGTGGGTGGGGGGG
+EAS54_6_R1_2_1_443_348
;;;;;;;;;;;9;7;;.7;393333
24 changes: 24 additions & 0 deletions tests/seqtk_seq.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
cwlVersion: v1.0
class: CommandLineTool
id: "seqtk_seq"
doc: "Convert to FASTA (seqtk)"
inputs:
- id: input1
type: File
inputBinding:
position: 1
prefix: "-a"
outputs:
- id: output1
type: File
outputBinding:
glob: out
baseCommand: ["seqtk", "seq"]
arguments: []
stdout: out
hints:
SoftwareRequirement:
packages:
- package: seqtk
version:
- r93
6 changes: 6 additions & 0 deletions tests/seqtk_seq_job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"input1": {
"class": "File",
"location": "2.fastq"
}
}
13 changes: 13 additions & 0 deletions typeshed/2.7/galaxy/__init__.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Stubs for galaxy (Python 3.5)
#
# NOTE: This dynamically typed stub was automatically generated by stubgen.

from typing import Any

PROJECT_NAME = ... # type: str
PROJECT_OWNER = ... # type: str
PROJECT_USERAME = ... # type: str
PROJECT_URL = ... # type: str
PROJECT_AUTHOR = ... # type: str
PROJECT_EMAIL = ... # type: str
RAW_CONTENT_URL = ... # type: Any
4 changes: 4 additions & 0 deletions typeshed/2.7/galaxy/tools/__init__.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Stubs for galaxy.tools (Python 3.5)
#
# NOTE: This dynamically typed stub was automatically generated by stubgen.

32 changes: 32 additions & 0 deletions typeshed/2.7/galaxy/tools/deps/__init__.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Stubs for galaxy.tools.deps (Python 3.5)
#
# NOTE: This dynamically typed stub was automatically generated by stubgen.

from typing import Any, Optional
from .resolvers import NullDependency as NullDependency
from .resolvers.conda import CondaDependencyResolver as CondaDependencyResolver
from .resolvers.galaxy_packages import GalaxyPackageDependencyResolver as GalaxyPackageDependencyResolver
from .resolvers.tool_shed_packages import ToolShedPackageDependencyResolver as ToolShedPackageDependencyResolver

log = ... # type: Any
EXTRA_CONFIG_KWDS = ... # type: Any
CONFIG_VAL_NOT_FOUND = ... # type: Any

def build_dependency_manager(config: Any): ... # type: DependencyManager

class NullDependencyManager:
dependency_resolvers = ... # type: Any
def uses_tool_shed_dependencies(self): ...
def dependency_shell_commands(self, requirements: Any, **kwds) -> List[str]: ...
def find_dep(self, name, version: Optional[Any] = ..., type: str = ..., **kwds): ...

class DependencyManager:
extra_config = ... # type: Any
default_base_path = ... # type: Any
resolver_classes = ... # type: Any
dependency_resolvers = ... # type: Any
def __init__(self, default_base_path, conf_file: Optional[Any] = ..., **extra_config) -> None: ...
def dependency_shell_commands(self, requirements: Any, **kwds) -> List[str]: ...
def requirements_to_dependencies(self, requirements, **kwds): ...
def uses_tool_shed_dependencies(self): ...
def find_dep(self, name, version: Optional[Any] = ..., type: str = ..., **kwds): ...
Loading

0 comments on commit 1625611

Please sign in to comment.