dcj-sat

dcj-sat is a software package implemented to compute the edit distance under the DCJ model for two genomes with duplicate genes using a SAT formulation. It returns the maximum number of cycles possible in the final decomposition of the adjacency graph of two genomes.

dcj-sat

Prerequisites

BOOST
Python >= 3.7
PySAT

Setting up BOOST

Download BOOST and uncompress it somewhere (compilation and installation is not necessary). Set environment variable BOOST_HOME to indicate the directory of BOOSt. For example, for UNIX platforms, add the following statement to the file ~/.bash_profile:

export BOOST_HOME='/directory/to/your/boost/boost_1_85_0'

Cloning Repository

# Clone the repository to your local machine
git clone https://github.com/Shao-Group/dcj-sat.git
cd dcj-sat/

Setting up the environment

# Create a Python virtual environment using venv (example)
python3 -m venv venv
source venv/bin/activate

# Install the pySAT library
pip install 'python-sat[aiger,approxmc,cryptosat,pblib]'

Building the files

# You might have to install automake(if not already installed) to build dependencies. For example (using homebrew on mac)
brew install automake

# Make the build script executable and run it
chmod +x build.sh
./build.sh

Input Files

Each input file represents a genome and each line in the file represents a gene. Each line must be in the following format

<GENE_ID> <GENE_FAMILY> <CHROMOSOME_NAME> <CHROMOSOME_TYPE>

GENE_ID: A unique identifier for each gene. GENE_FAMILY: The gene family as an integer. CHROMOSOME_NAME: Name of the chromosome as an integer. CHROMOSOME_TYPE: Type of chromosome (1 for linear, 2 for circular).

Testing

Running the test

# Make the testing script executable and run it
chmod +x run_sat.sh
./run_sat.sh <path_to_g1_file> <path_to_g2_file>

Test Files

Real Data

You can find the test files for real data in the 'test_files/real_data'. Inside it are three directories corresponding to including all genes and including genes with less than 2 and less than 3 gene families. In each each of these folders is a folder corresponding to each pair. For eg, to compare gorilla and human containing genes with less than 3 gene families:

<path_to_g1_file>
test_files/real_data/less_than_three/gorilla_human/gorilla.dcj

<path_to_g2_file>
test_files/real_data/less_than_three/gorilla_human/human.dcj

Simulated Data

Test files for simulated data can be found in 'test_files/simulations'. Inside are two folders 'variable_dcj_ops' and 'variable_gene_families'. Each of these folders contain a folder corresponding to an instance. This folder contains three files, the original genome, and the two pairs of genomes to compare. For eg, to run the package on the genome with 500 genes, 340 gene families and 150 DCJ operations:

<path_to_g1_file>
test_files/simulations/variable_gene_families/sim_500_340_150/sim_500_340_150_1.dcj

<path_to_g2_file>
test_files/simulations/variable_gene_families/sim_500_340_150/sim_500_340_150_2.dcj

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
gredu-dcj		gredu-dcj
reading_simplified_params		reading_simplified_params
sat_solver		sat_solver
test_files		test_files
utils		utils
README.md		README.md
build.sh		build.sh
main.py		main.py
run_sat.sh		run_sat.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dcj-sat

Prerequisites

Setting up BOOST

Cloning Repository

Setting up the environment

Building the files

Input Files

Testing

Running the test

Test Files

Real Data

Simulated Data

About

Releases

Packages

Contributors 2

Languages

Shao-Group/dcj-sat

Folders and files

Latest commit

History

Repository files navigation

dcj-sat

Prerequisites

Setting up BOOST

Cloning Repository

Setting up the environment

Building the files

Input Files

Testing

Running the test

Test Files

Real Data

Simulated Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages