Skip to content

Example2: Testing TADs on all chromosomes

liuyc27 edited this page Aug 22, 2024 · 5 revisions

Data is using Amazon.

  • Identify TADs in multiple chromosomes simultaneously.
python diffdomain-py3/diffdomains.py dvsd multiple https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic https://hicfiles.s3.amazonaws.com/hiseq/k562/in-situ/combined.hic data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt --ofile res/temp/temp.txt --reso 10000

Results is saved to <res/temp/temp.txt>.

  • MultiComparison adjustment.
python diffdomain-py3/diffdomains.py adjustment fdr_bh res/temp/temp.txt res/reorganized_TADs_GM12878_K562.txt 

Results is saved to <res/reorganized_TADs_GM12878_K562.txt>.

  • optional parameter [--filter], only keeping reorganized TADs with BH < 0.05.
python diffdomain-py3/diffdomains.py adjustment fdr_bh res/temp/temp.txt res/reorganized_TADs_GM12878_K562_filter.txt --filter true

Results is saved to <res/reorganized_TADs_GM12878_K562_filter.txt>.

  • Classification of TADs into six subtypes.

In this step, you will need the TAD list form condition 2.

(e.g. data/GSE63525_K562_Arrowhead_domainlist.txt)

Running the command:

python diffdomain-py3/classification.py -d res/reorganized_TADs_GM12878_K562.txt -t data/GSE63525_K562_Arrowhead_domainlist.txt --out res/reorganized_TADs_GM12878_K562_subtypes.txt

Results is saved to <res/reorganized_TADs_GM12878_K562_subtypes.txt>.

  • Subdividing the strength-change type into two categories

Running the command:

python diffdomain-py3/subdivide_strength_change.py -f res/reorganized_TADs_GM12878_K562_subtypes.txt -h1 https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic -h2 https://hicfiles.s3.amazonaws.com/hiseq/k562/in-situ/combined.hic -t1 data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt -t2 data/GSE63525_K562_Arrowhead_domainlist.txt --out res/reorganized_TADs_GM12878_K562_subtypes_up_down.txt --reso 10000

Final results is saved to <res/reorganized_TADs_GM12878_K562_subtypes_up_down.txt>.

Note:

1. Output File Structure

The output file <res/reorganized_TADs_GM12878_K562_subtypes_up_down.txt> comprises multiple columns.

  • Chromosome Number (chr), Start Position (start), End Position (end), range, type, origin, subtype, significant.

Each row stores information about different TADs.

2. Categories of Reorganized TADs

Reorganized TADs are categorized into six types: loss, merge, split, complex, zoom, and strength-change.

  • Type column includes the types : loss, merge, split, complex.

  • Subtype column includes zoom and strength-change.

  • subdivide_strength_change column includes two subtypes of strength-change : strength-change up and strength-change down.

3. Mathematical Definitions of Strength-Change TAD Subtypes

A strength-change TAD can be classified into two subtypes.

  • Strength-change up TAD : Indicates an increase in Hi-C contact frequencies under biological condition 2.

  • Strength-change down TAD : Indicates a decrease in Hi-C contact frequencies under biological condition 2.

The classification is based on the following mathematical definitions.

Given a strength-change TAD :

  - m1: Median value of KR-normalized Hi-C contact frequencies within the TAD in condition 1.

  - m2: Median value of KR-normalized Hi-C contact frequencies within the same TAD region in condition 2.

  - s1: Sum of KR-normalized Hi-C contact frequencies across all condition 1 TADs.

  - s2: Sum of KR-normalized Hi-C contact frequencies across all condition 2 TADs.

  If the condition (m1 / m2) * (s1 / s2) < 1 is satisfied, the TAD is classified as a strength-change up TAD. Otherwise, it is classified as a strength-change down TAD.

4. Demonstrations

Here are a few simple demonstrations of the output from the subdivide_strength_change.py script.

  • Filtering TADs Reorganization Types by Location

You can use 'chr', 'start', 'end', or 'range' to filter specific TADs or sets of TADs.

Subsequently, you can directly observe the reorganization types of these TADs within the 'type', 'subtype', or 'subdivide_strength_change' columns.The significance of this reorganization is shown in the 'significant' column (0 means not significant, 1 means significant).

For example, by specifying chr=1, start=20680000, end=20830000, origin='condition1', you will locate the specific TAD in the file and determine that its reorganization type is 'loss', significant reorganization.

Loss: "Condition 2 has no TAD that overlaps with or is identical to the reorganized TAD."

a loss demo

  • Filtering TADs by Reorganization Type

Alternatively, you can choose your interested reorganization type from 'type', 'subtype', or 'strength change' to see all related TADs.

For instance, setting type='merge' and origin='condition1' allows you to query all TADs with a reorganization type of 'merge' (biological condition 1).

For each of significant reorganization TADs, except for those classified as 'loss', corresponding entries in biological condition 2 are provided immediately following the information for condition 1.

Merge: "The reorganized TAD has a many-to-one identical or overlapping relationship with a TAD in condition 2."

a merge demo

  • Other Examples

Next, illustrations will display the representation of various reorganization types in the output file. Red rectangles highlight the TADs identified in biological condition 1 and their corresponding status in biological condition 2, for a specific reorganization type.

Split: "The reorganized TAD has either a one-to-many identical relationship or a one-to-many overlapping relationship with TADs in condition 2."

a split demo

Zoom: "The reorganized TAD has a one-to-one overlapping relationship with a TAD in condition 2."

a zoom demo

Strength-change: "The reorganized TAD in condition 1 has a one-to-one identical relationship with a TAD in condition 2."

a strength-change demo

Complex: "All remaining reorganized TADs that do not fit into the previously defined sub-types are classified as complex TADs."

a complex demo