A tool to create low-density minimizer orders using a greedy approach. The repository for the paper Generating low-density minimizers. We implemented two variants of GreedyMini: GreedyMini+, for generating low expected density minimizer orders, and GreedyMiniParticular+, for generating low particular density minimizer orders.
This document provides step-by-step instructions to use the GreedyMini variants.
Before you begin, ensure you have the following:
- 64-bit system: Required to run the binaries or compile GreedyMini.
- C++ Compiler: Only required if you are compiling GreedyMini yourself (supports C++20, e.g.,
g++
version 10 or higher).
Precompiled binaries are available for Ubuntu and macOS. You can download them from the GitHub release page.
If you prefer to compile the project from source, follow the steps below.
Verify that you have a suitable C++ compiler installed:
g++ --version
You should see output similar to:
g++ (GCC) 10.2.0
If you don't have g++
or it's outdated, you may need to use clang++
or install a newer version locally.
Download and extract the Boost headers:
cd ~
wget https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/boost_1_82_0.tar.gz
tar -xzf boost_1_82_0.tar.gz
This will create a boost_1_82_0
directory in your home directory.
Navigate to the GreedyMini
directory and compile the project:
cd ~/GreedyMini
g++ -std=c++20 -O3 -march=native -I ~/boost_1_82_0 *.cpp -o GreedyMini
After compiling or downloading the binary, you can run GreedyMini using different modes and parameters.
We ran our test on the first 1M nucleotides of chromosome X from Genome assembly T2T-CHM13v2.0. To do that we used the python notebook shorten_fasta.ipynb
which is located in various scripts/preprocessing chr x/
. We then put the resulting .fasta
file in the same directory as the GreedyMini
executable.
Execute the following command to run all the tests from the paper:
./GreedyMini -mode tests
To generate a minimizer with low expected density, run:
./GreedyMini -mode expected -w {w} -k {k}
Replace {w}
and {k}
with your desired values:
{w}
: The window size.{k}
: The k-mer size.
Note: Ensure that
w + k < 64
(due to 64 bit reliance).
An example run would be:
./GreedyMini -mode expected -w 5 -k 4
GreedyMiniParticular+
expects a .fasta
file containing exactly one sequence.
Use the following command to generate minimizers with particular density:
./GreedyMini -mode particular -w {w} -k {k} -path {path} -name {name}
Where:
{w}
: The window size.{k}
: The k-mer size.{path}
: a path to thefasta
file containing the sequence.{name}
: The name for the generated orders.
Note: Ensure that
w + k < 64
(due to 64 bit reliance).
In our paper we used chr_x_1m.fasta
as the path.
An example run would be:
./GreedyMini -mode particular -w 5 -k 4 -path chr_x_1m.fasta -name 1M
Customize the behavior of GreedyMini with the following options:
-greedy_mini_runs
: Number of runs of GreedyMini (default:4096
)-n_cores
: Number of CPU cores to use (default: half the total available threads due to hyper-threading)-min_alpha
: Minimum alpha value (default:0.939088
)-max_alpha
: Maximum alpha value (default:0.999590
)-max_swapper_time_minutes
: Maximum swapper time in minutes
Generated minimizers will appear inside the output/minimizers
folder. For particular density minimizers, they will appear in a subfolder with the selected name.
Example filename:
{w}_{k}_{min_alpha}_{max_alpha}_swapped.bin
See Additional Parameters for the default values of min_alpha
and max_alpha
.
We provide the Python notebook load_order.ipynb
alongside the best minimizer orders from the paper, both located in the folder minimizer loading example
. The notebook showcases how to load a minimizer to memory and print the order of each k-mer. For C++, we recommend looking at the functions load_order()
and load_vector_from_file()
located in code/tools.cpp
.
In case of issues with the tool, you may contact us at [email protected].