ClusterTAD : An unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data

Bioinformatics, Data Mining, Machine Learning (BDM) Laboratory,

University of Missouri, Columbia MO 65211

Developer:
              Oluwatosin Oluwadare
              Department of Computer Science
              University of Missouri, Columbia
              Email: [email protected]

Contact:
              Jianlin Cheng, PhD
              Department of Computer Science
              University of Missouri, Columbia
              Email: [email protected]

1. Content of folders:

executable: latest ClusterTAD.jar version can be downloaded from the release tab
examples: contains example data and outputs generated from ClusterTAD for these datasets
src: ClusterTAD Java and MATLAB source codes
TADs: contains identified topological domains for two mESC and mouse cortex cell type using ClusterTAD

2. Hi-C Data used in this study:

In our study, we used the normalized Hi-C matrix processed by Bing Ren's Lab in University of Calfornia, San Diego. Download the normalized Matrix here : http://chromosome.sdsc.edu/mouse/hi-c/download.html

3. Input matrix file format:

The input to ClusterTAD is a tab seperated N by N intra-chromosomal contact matrix derived from Hi-C data, where N is the number of equal-sized regions of a chromosome.

4. Usage:

4.1. Java:
To run the tool, open command line interface and type: java -jar ClusterTAD.jar Input_Matrix_file Matrix_Resolution

Parameters are as follow:

Input_Matrix_file : A tab seperated N by N intra-chromosomal Hi-C contact matrix.
Matrix_Resolution : Contact Matrix Resolution.

4.2. MATLAB:
Instructions on how to run the MATLAB source code is given here /src/MATLAB source code/

5. Output

ClusterTAD produces 2 folders in Output folder:

5.1. Clusters:

Contains a .txt file that contains the cluster assignment for the diagonal for all the K values considered

5.2. TADs:

Contains the .txt files listing the TADs extracted from each clustering and reclustering done.
Contains the Best TAD identified based on the Quality score, labeled as "BestTAD_[nameofinputfile]_K=.txt".
Contains a .txt file which contains a list of the extracted TAD Quality scores, file name = [nameofinputfile]_TAD_QualityScore_List.

6. Disclaimer

The executable software and the source code of ClusterTAD is distributed free of charge as it is to any non-commercial users. The authors hold no liabilities to the performance of the program.

7. Citations

Oluwadare, Oluwatosin, and Jianlin Cheng. "ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data." BMC bioinformatics 18.1 (2017): 480.

8. Common questions

8.1. What is the format of the domain output genererated?

The domain extracted in ClusterTAD are presented in the format from.id from.cord to.id to.cord where:

from.id : start bin id for a domain.
from.cord : coordinate of the start bin id for a domain based on data Resolution
to.id : end bin id for a domain.
to.cord : coordinate of the end bin id for a domain based on data Resolution

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
TADs		TADs
examples		examples
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClusterTAD : An unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data

1. Content of folders:

2. Hi-C Data used in this study:

3. Input matrix file format:

4. Usage:

5. Output

6. Disclaimer

7. Citations

8. Common questions

About

Releases 1

Packages

Contributors 2

Languages

BDM-Lab/ClusterTAD

Folders and files

Latest commit

History

Repository files navigation

ClusterTAD : An unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data

1. Content of folders:

2. Hi-C Data used in this study:

3. Input matrix file format:

4. Usage:

5. Output

6. Disclaimer

7. Citations

8. Common questions

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages