Skip to content

Customising your Snaketool

Michael Roach edited this page Nov 21, 2022 · 4 revisions

I've cookiecuttered the template, now what?

First, install it as a python module cd my_snektool && pip install -e . so that you can easily run, rerun, rererun while developing your launcher. Next, build your pipeline in the workflow dir. The example simply uses a Snakefile and config.yaml but you should add a directory for conda environments, rules files, etc. following Snakemake best practices.

Finally, open up __main__.py in your favourite Python IDE and get to work!

Recommended reading

Familiarising yourself with the Click command line interface for Python will help you through this guide.

Adding common command line options

If you have command line options that you want to use in more than one subcommand, you can add them to common_options(). Let's add an option to define a temporary directory

 def common_options(func):
     options = [
         click.option('--output', help='Output directory', type=click.Path(),
                      default='my_snaketool.out', show_default=True),
+        click.option('--temp', help='Directory for temporary files', type=click.Path(),
+                     default='my_snaketool.temp', show_default=True),
         click.option('--configfile', default='config.yaml', help='Custom config file', show_default=True),
         click.option('--threads', help='Number of threads to use', default=1, show_default=True),
         click.option('--use-conda/--no-use-conda', default=True, help='Use conda for Snakemake rules',
                      show_default=True),
         click.option('--conda-frontend',
                      type=click.Choice(['mamba', 'conda'], case_sensitive=True),
                      default='my_snaketool', help='Specify Conda frontend', show_default=True),
         click.option('--conda-prefix', default=snake_base(os.path.join('workflow', 'conda')),
                      help='Custom conda env directory', type=click.Path(), show_default=False),
         click.option('--snake-default', multiple=True,
                      default=['--rerun-incomplete', '--printshellcmds', '--nolock', '--show-failed-logs'],
                      help="Customise Snakemake runtime args", show_default=True),
         click.argument('snake_args', nargs=-1)]
     for option in reversed(options):
         func = option(func)
     return func

Customising subcommands

The subcommand run() launches the main pipeline Snakefile. This also demonstrates all the available options when calling the run_snakemake() function.

Most of the customisation you will probably want to do will be to add command line arguments that will be added to the configuration.

  • Add new args as click options
  • define them when calling the subcommand script
  • add them to the merge_config dictionary.

That's it! The new options will be available within the Snakemake config dictionary.

 @click.command(epilog=help_message_extra, context_settings={"ignore_unknown_options": True})
 @click.option('--input', '_input', help='Input file/directory', type=str, required=True)
+@click.option('--search', type=click.Choice(['fast', 'slow'], case_sensitive=False), default='fast', help='Search setting', show_default=True)
 @common_options
-def run(_input, output, **kwargs):
+def run(_input, search, temp, output, **kwargs):
     """Run My Snaketool"""
    
     merge_config = {
         'input': _input,
         'output': output,
+        'search': search,
+        'temp': temp
     }
     run_snakemake(
         snakefile_path=snake_base(os.path.join('workflow', 'Snakefile')),
         merge_config=merge_config,
         **kwargs
     )

Adding new subcommands

Adding new subcommands is relatively easy. Say we have a super simple Snakemake script for installing the databases such as this example: https://gist.github.com/beardymcjohnface/9b26614536410addf42fc794dd4cab35

Let's make it available with an install subcommand, e.g. my_snaketool install .... Use run() as a template and strip out what you dont want. We will only keep the common options for running Snakemake.

  • Create the installation Snakemake script: workflow/install.smk
  • Create new subcommand function (use 'run()' as a template): install()
  • Update the function doc string, which will become the help message for the subcommand
 @click.command(epilog=help_message_extra, context_settings={"ignore_unknown_options": True})
-@click.option('--input', '_input', help='Input file/directory', type=str, required=True)
-@click.option('--search', type=click.Choice(['fast', 'slow'], case_sensitive=False), default='fast', help='Search setting', show_default=True)
 @common_options
+def install(**kwargs):
-def run(_input, search, temp, output, **kwargs):
+    """Install databases"""
-    """Run My Snaketool"""
-    copy_config(configfile, system_config=snake_base(os.path.join('config', 'config.yaml')))
-    
-    merge_config = {
-        'input': _input,
-        'output': output,
-        'search': search,
-        'temp': temp
-    }
     run_snakemake(
+        snakefile_path=snake_base(os.path.join('workflow', 'install.smk')),
-        snakefile_path=snake_base(os.path.join('workflow', 'Snakefile')),
-        merge_config=merge_config,
         **kwargs
     )

Lastly, add this function name to click's list of commands. Note the order of commands is preserved in the click help message.

 cli.add_command(run)
+cli.add_command(install)
 cli.add_command(config)

Adding Snakemake targets

You may wish to add groups of targets to your Snakefile for defining different run stages. For instance, your pipeline might perform preprocessing and assembly. You can define alternative top-level rules to let users run specific stages of the pipeline. Note: target_rules and the targetRule decorator is only needed for print_targets.

### In Snakefile ###

 target_rules = []
 def targetRule(fn):
     assert fn.__name__.startswith('__')
     target_rules.append(fn.__name__[2:])
     return fn

 @targetRule
 rule all:
     input:
         preprocessing_files,
         assembly_files

+@targetRule
+rule preprocessing:
+    input:
+        preprocessing_files
+
+@targetRule
+rule assembly:
+    input:
+        assembly_files

 @targetRule
 rule print_targets:
     run:
         print("\nTop level rules are: \n", file=sys.stderr)
         print("* " + "\n* ".join(target_rules) + "\n\n", file=sys.stderr)

You dont need to change anything in the launcher for this new functionality to work.

my_snaketool run ... preprocessing

But you could update the help message to add the new available run stages.

### in __main__.py ###

 help_message_extra = """
 \b
 CLUSTER EXECUTION:
 my_snaketool run ... --profile [profile]
 For information on Snakemake profiles see:
 https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
 \b
 RUN EXAMPLES:
 Required:           my_snaketool run --input [file]
 Specify threads:    my_snaketool run ... --threads [threads]
 Disable conda:      my_snaketool run ... --no-use-conda 
 Change defaults:    my_snaketool run ... --snake-default="-k --nolock"
 Add Snakemake args: my_snaketool run ... --dry-run --keep-going --touch
 Specify targets:    my_snaketool run ... all print_targets
 Available targets:
     all             Run everything (default)
+    preprocessing   Run preprocessing steps only
+    assembly        Run assembly steps only
     print_targets   List available targets
"""