Skip to content

Commit

Permalink
Merge pull request #966 from ComparativeGenomicsToolkit/docker-fixes
Browse files Browse the repository at this point in the history
prep release 2.4.4
  • Loading branch information
glennhickey authored Mar 16, 2023
2 parents 530af32 + 9026285 commit 985e2c5
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ python3 -m pip install -U -r ./toil-requirement.txt

If you have Docker installed, you can now run Cactus. All binaries, such as `lastz` and `cactus-consolidated` will be run via Docker. Singularity binaries can be used in place of docker binaries with the `--binariesMode singularity` flag. Note, you must use Singularity 2.3 - 2.6 or Singularity 3.1.0+. Singularity 3 versions below 3.1.0 are incompatible with cactus (see [issue #55](https://github.com/ComparativeGenomicsToolkit/cactus/issues/55) and [issue #60](https://github.com/ComparativeGenomicsToolkit/cactus/issues/60)).

By default, cactus will use the image, `quay.io/comparative-genomics-toolkit/cactus:<CACTUS_COMMIT>` when running binaries. This is usually okay, but can be overridden with the `CACTUS_DOCKER_ORG` and `CACTUS_DOCKER_TAG` environment variables. For example, to use GPU release 2.4.3, run `export CACTUS_DOCKER_TAG=v2.4.3-gpu` before running cactus.
By default, cactus will use the image, `quay.io/comparative-genomics-toolkit/cactus:<CACTUS_COMMIT>` when running binaries. This is usually okay, but can be overridden with the `CACTUS_DOCKER_ORG` and `CACTUS_DOCKER_TAG` environment variables. For example, to use GPU release 2.4.4, run `export CACTUS_DOCKER_TAG=v2.4.4-gpu` before running cactus.

### Compiling Binaries Locally
In order to compile the binaries locally and not use a Docker image, you need some dependencies installed. On Ubuntu (we've tested on 20.04 and 22.04), you can look at the [Cactus Dockerfile](./Dockerfile) for guidance. To obtain the `apt-get` command:
Expand Down
15 changes: 14 additions & 1 deletion ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,17 @@
# Release 2.4.4 2023-03-07
# Release 2.4.4 2023-03-16

This release includes some new export tools for the UCSC Genome Browser

- `cactus-hal2chains` created in order to convert HAL output from Cactus into sets of pairwise alignment chains, using either `halLiftover` or `halSynteny`
- `cactus-maf2bigmaf` created to convert `.maf` output from `cactus-hal2maf` to BigMaf and BigMaf Summary files for display on the Genome Browser
- `cactus-hal2maf` typo fixed where 3 (instead of 30) was set for the default value of `--maximumGapLength`
- Boost TAFFY normalization defaults in `cactus-hal2maf`, bringing `--maxmimumGapLength` to 100, and `--maximumBlockLengthToMerge` to 1000, and adding the heuristic block-breaking dupe filter from `taffy norm`. The latter is on by default to prevent over fragmentation, but can be disabled with `--kepGapCausingDupes`
- Remove `--onlyOrthologs` and `--noDupes` options from `cactus-hal2maf` and replace with the `--dupeMode` option. `--dupeMode single` is now the recommended way of getting at most one row / species. More information about this added to the documentation.
- `--maxRefNFrac` option added to `cactus-hal2maf` to filter out blocks where the reference sequence is mostly Ns (default to filter out >95% Ns).
- Change abPOA scoring matrix to be more consistent with lastz parameters used by cactus, where `N` bases are penalized when aligned with other characters. Before, they could be aligned to anything. This will hopefully make the above filter less necessary.
- Fix bug where `cactus-blast --restart` would not work.

# Release 2.4.3 2023-03-07

This release patches a critical pangenome indexing bug introduced in v2.3.0, where a typo in the refactor of `cactus-graphmap-join` effectively caused *all* variation to be removed from the allele-frequency-filtered (ie .d2) graphs.

Expand Down
6 changes: 3 additions & 3 deletions doc/progressive.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ The various batching options can be used to tune distributed runs on very large
Depending on the application, you may want to handle duplication events differently when creating the MAF. Three different modes are available via the `--dupeMode` option.

* "single" : Uses greedy heuristics to pick the copy for each species that results in fewest mutations and block breaks. Recommended when visualizing via BigMaf (see below)
* "ancestral" : Restricts the duplication relationships shown to only those orthologous to the reference genome according to the HAL tree. There may be multiple orthologs per genome. This relies on the dating of the duplication in the hal tree (ie in which genome it is explicitly self-aligned) and is still a work in progress.
* "ancestral" : Restricts the duplication relationships shown to only those orthologous to the reference genome according to the HAL tree. There may be multiple orthologs per genome. This relies on the dating of the duplication in the hal tree (ie in which genome it is explicitly self-aligned) and is still a work in progress. For example, in a tree with `((human,chimp),gorilla)`, if a duplication in human is collapsed (ie a single copy) in the human-chimp ancestor, then it would not show up on the human-referenced MAF using this option. But if the duplication is not collapsed in this ancestor (presumably because each copy has an ortholog in chimp and gorilla), then it will be in the MAF because the duplication event was higher in the tree.
* "all" : (default) All duplications are written, including ancestral events (orthologs) and paralogs in the reference. Note that by default, some duplications will be filtered out if they break MAF blocks. To disable this in order to truly catch them all, use `--keepGapCausingDupes`.

Usually a reference genome is specified with `--refGenome` and ancestral genomes are excluded `--noAncestors`. Since the default reference is the root of the alignment, `--noAncestors` can only be specified if a leaf genome is used with `--refGenome`.
Expand Down Expand Up @@ -164,12 +164,12 @@ Conservation scores can be computed using [phast](http://compgen.cshl.edu/phast/
The Cactus Docker image contains everything you need to run Cactus (python environment, all binaries, system dependencies). For example, to run the test data:

```
docker run -v $(pwd):/data --rm -it quay.io/comparative-genomics-toolkit/cactus:v2.4.3 cactus /data/jobStore /data/evolverMammals.txt /data/evolverMammals.hal
docker run -v $(pwd):/data --rm -it quay.io/comparative-genomics-toolkit/cactus:v2.4.4 cactus /data/jobStore /data/evolverMammals.txt /data/evolverMammals.hal
```

Or you can proceed interactively by running
```
docker run -v $(pwd):/data --rm -it quay.io/comparative-genomics-toolkit/cactus:v2.4.3 bash
docker run -v $(pwd):/data --rm -it quay.io/comparative-genomics-toolkit/cactus:v2.4.4 bash
cactus /data/jobStore /data/evolverMammals.txt /data/evolverMammals.hal
```
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def run(self):

setup(
name = "Cactus",
version = "2.4.3",
version = "2.4.4",
author = "Benedict Paten",
package_dir = {'': 'src'},
packages = find_packages(where='src'),
Expand Down
2 changes: 1 addition & 1 deletion src/cactus/shared/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ def getDockerImage():

def getDockerRelease(gpu=False):
"""Get the most recent docker release."""
r = "quay.io/comparative-genomics-toolkit/cactus:v2.4.3"
r = "quay.io/comparative-genomics-toolkit/cactus:v2.4.4"
if gpu:
r += "-gpu"
return r
Expand Down

0 comments on commit 985e2c5

Please sign in to comment.