Skip to content

Commit

Permalink
Updated README with integrated large camel.
Browse files Browse the repository at this point in the history
  • Loading branch information
OndrejSladky committed Aug 21, 2024
1 parent 96a7a58 commit 72905d8
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 38 deletions.
39 changes: 7 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ They come in two different implementations (their results may differ due to the
Note that at this point only the implementations with hash table are optimized and that the Aho-Corasick automaton
based versions of the algorithms are only experimental.

The hashing based implementations of the default KmerCamel🐫 (`./kmercamel`) support $k$-mer with $k$ at most 31,
whereas the larger KmerCamel🐫 (`./kmercamel-large`) supports $k$-mers with $k$ at most 63 (at the cost of slight slowdown).
The hashing based implementations of KmerCamel🐫 support $k$-mer with $k$ at most 63,

All algorithms can be used to either work in the unidirectional model or in the bidirectional model
(i.e. treat $k$-mer and its reverse complement as the same; in this case either of them appears in the result).
Expand Down Expand Up @@ -62,8 +61,8 @@ on macOS.

The program has the following arguments:

- `-p path_to_fasta` - the path to fasta file. This is a required argument.
- `-k value_of_k` - the size of one k-mer. This is a required argument.
- `-p path_to_fasta` - the path to fasta file (can be `gzip`ed). This is a required argument.
- `-k value_of_k` - the size of one k-mer (up to 63). This is a required argument.
- `-a algorithm` - the algorithm which should be run. Either `global` or `globalAC` for Global Greedy, `local` or `localAC` for Local Greedy.
The versions with AC use Aho-Corasick automaton. Default `global`.
- `-o output_path` - the path to output file. If not specified, output is printed to stdout.
Expand All @@ -79,27 +78,18 @@ The output contains the resulting superstring - capital letters indicate that at
For example:

```
./kmercamel -p ./spneumoniae.fa -a local -k 12 -d 7 -c
./kmercamel -p ./spneumoniae.fa -a local -k 31 -d 5 -c
```

runs the Local Greedy in the bidirectional model on the streptococcus fasta file with `k=12` and `d=7`.
runs the Local Greedy in the bidirectional model on the streptococcus fasta file with `k=31` and `d=5`.

Alternatively, if your operating system supports it, you can run `./🐫` instead of `./kmercamel`.

Currently, KmerCamel🐫 does not support gziped files as an input.
A possible workaround is to use `gzcat` and process substitution.

```
./kmercamel -k 13 -p <(gzcat fasta_file.fa.gz)
```

Note: on some systems you might need to use the name `zcat` instead of `gzcat`.

### Mask optimization

For mask optimization, run the subcommand `optimize` with the following arguments:

- `p path_to_fasta` - the path to fasta file. This is a required argument.
- `p path_to_fasta` - the path to fasta file (can be `gzip`ed). This is a required argument.
- `k k_value` - the size of one k-mer. This is a required argument.
- `a algorithm` - the algorithm for mask optimization. Either `ones` for maximizing the number of 1s, `runs` for minimizing the number of runs of 1s, `runsapprox` for approximately minimizing the number of runs of 1s, or `zeros` for maximizing the number of 0s. Default `ones`.
- `o output_path` - the path to output file. If not specified, output is printed to stdout.
Expand All @@ -110,26 +100,11 @@ For mask optimization, run the subcommand `optimize` with the following argument
For example:

```
./kmercamel optimize -p ./global-spneumoniae.fa -k 12 -a runs -c
./kmercamel optimize -p ./global-spneumoniae.fa -k 31 -a runs -c
```

minimizes the number of runs of 1s in the mask of the superstring computed by Global Greedy in the bidirectional model on the streptococcus fasta file with `k=12`.

Note: currently mask optimization cannot read gziped files, nor can the proccess substititution be used.

### Large $k$-mers

The default version of KmerCamel🐫 does not support $k > 31$. For those values, use the large KmerCamel🐫,
which supports $k < 64$.

For example:

```
./kmercamel-large -p ./spneumoniae.fa -a global -k 63 -c
```

Note: for smaller $k$ it is recommended to use default KmerCamel🐫 as it is faster.

### Turn off memory optimizations for Global

In order to reduce the memory footprint of hash-table based Global Greedy,
Expand Down
12 changes: 6 additions & 6 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,22 @@
int Help() {
std::cerr << "KmerCamel version " << VERSION << std::endl;
std::cerr << "Accepted arguments:" << std::endl;
std::cerr << " -p path_to_fasta - required; valid path to fasta file" << std::endl;
std::cerr << " -k k_value - required; integer value for k" << std::endl;
std::cerr << " -p path_to_fasta - required; valid path to fasta file (can be gziped)" << std::endl;
std::cerr << " -k k_value - required; integer value for k (up to 63)" << std::endl;
std::cerr << " -a algorithm - the algorithm to be run [global (default), globalAC, local, localAC, streaming]" << std::endl;
std::cerr << " -o output_path - if not specified, the output is printed to stdout" << std::endl;
std::cerr << " -d d_value - integer value for d_max; default 5" << std::endl;
std::cerr << " -c - treat k-mer and its reverse complement as equal" << std::endl;
std::cerr << " -m - turn off the memory optimizations for global" << std::endl;
std::cerr << " -h - print help" << std::endl;
std::cerr << " -v - print version" << std::endl;
std::cerr << "Example usage: ./kmercamel -p path_to_fasta -k 13 -d 5 -a local" << std::endl;
std::cerr << "Example usage: ./kmercamel -p path_to_fasta -k 31 -d 5 -a local -c" << std::endl;
std::cerr << "Possible algorithms: global globalAC local localAC streaming" << std::endl;
std::cerr << std::endl;
std::cerr << "For optimization of masks use `kmercamel optimize`." << std::endl;
std::cerr << "Accepted arguments:" << std::endl;
std::cerr << " -p path_to_fasta - required; valid path to fasta file" << std::endl;
std::cerr << " -k k_value - required; integer value for k" << std::endl;
std::cerr << " -p path_to_fasta - required; valid path to fasta file (can be gziped)" << std::endl;
std::cerr << " -k k_value - required; integer value for k (up to 63)" << std::endl;
std::cerr << " -a algorithm - the algorithm to be run [ones (default), runs, runsapprox, zeros]" << std::endl;
std::cerr << " -o output_path - if not specified, the output is printed to stdout" << std::endl;
std::cerr << " -c - treat k-mer and its reverse complement as equal" << std::endl;
Expand Down Expand Up @@ -184,7 +184,7 @@ int main(int argc, char **argv) {
std::cerr << "d must be non-negative." << std::endl;
return Help();
} else if (k > MAX_K && (algorithm == "local" || algorithm == "global")) {
std::cerr << "k > " << MAX_K << " not supported for the algorithm '" + algorithm + "'. Use the 128bit version of KmerCamel or the AC version of the algorithm instead." << std::endl;
std::cerr << "k > " << MAX_K << " not supported for the algorithm '" + algorithm + "'. Use the AC version of the algorithm instead." << std::endl;
return Help();
} else if (d_set && (algorithm == "globalAC" || algorithm == "global" || algorithm == "streaming")) {
std::cerr << "Unsupported argument d for algorithm '" + algorithm + "'." << std::endl;
Expand Down

0 comments on commit 72905d8

Please sign in to comment.