pzm-tools
is a script collection tailored for post-zygotic mutation (PZM) analysis;
it is packaged as a Python package, Dockerized for containerization,
and WDLized for workflow compatibility.
You have multiple options for running it:
-
Local Installation: You can install it on your local machine as a Python package and use it programmatically.
-
Command Line: Run it directly from the command line by invoking its executable.
-
Docker Image: Execute it within a Docker container, ensuring portability and isolation.
-
Cloud Execution: These tools are WDLized, offering flexibility for execution:
-
Local Cromwell: Run them on a local Cromwell engine without server or deployment hassles, mainly for development purposes.
-
Cloud Cromwell Deployment: Execute on a cloud-based Cromwell deployment, mainly for power users to test scalability.
-
Terra workspace: Run within a Terra workspace, mainly for end-users.
-
# Clone the repository
> git clone https://github.com/talkowski-lab/pzm
> cd pzm
# Create a virtual environment and activate it.
> virtualenv .venv
> source .venv/bin/activate
# Install the package
> pip install src/pzm_tools
First, install the package following the instructions given in the local installation section, then you may invoke it from the command line as the following.
> cd src
> pzm-tools label \
test_data/rf_model.joblib \
test_data/SS0012986.vcf.gz \
--output-prefix test
You may run pzm-tools --help
for more detailed documentation
on the subcommands and their arguments.
For local development purposes, you may run and execute into the pzm-tools
Docker image using the following command.
> TAG=us.gcr.io/broad-dsde-methods/vjalili/pzm:v0.1
> docker run -it --entrypoint /bin/bash $TAG
Once you run and exec into the Docker image, you may run the
pzm-tools
from command line as this. For example:
> docker run -it --entrypoint /bin/bash $TAG
root@65337f2018ce:/# pzm-tools
usage: pzm-tools [-h] {label} ...
Tools for studying PZM variants
optional arguments:
-h, --help show this help message and exit
subcommands:
{label}
label Label the variants in a given VCF file as PZM or not-PZM
using a trained random forest model.
You may build the Docker image using the following command.
> cd pzm/src/
> TAG=us.gcr.io/broad-dsde-methods/vjalili/pzm:v0.1
> docker image build --no-cache --platform linux/amd64 --tag $TAG .
In order to use the Docker image in Terra, you would need to
push the image to a container registry accessible to Terra.
Assuming the tag you specified in TAG
refers to such registry,
you may run the following to push the image.
> docker push $TAG
For development purposes, you may take the following steps to run the WDL locally---without needing a Cromwell deployment or Terra workspace.
-
Download Cromwell from Cromwell releases page.
-
Run the following command to execute the
Classifier
workflow.> cd src > java -jar cromwell-72.jar run \ --inputs test_data/inputs.json \ --options test_data/options.json \ --metadata-output metadata.json \ wdl/Classifier.wdl \
-
Install and configure cromshell to interface with a Cromwell server.
-
Create
inputs.json
andoptions.json
. You may usetest_data/inputs.json
andtest_data/options.json
as template, ensure the files are stored on a Google cloud bucket that is accessible to the cromwell server you are using. -
Run the following command to submit the workflow to be executed on Cromshell.
cromshell submit wdl/Classifier.wdl inputs.json options.json
-
You may check the status of your submission as the following.
cromshell status
The Classifier
workflow is hosted on
this Dockstore
page. You may follow
these steps
to import it into your Terra workspace.