Skip to content

1.1 Usage

liuyc27 edited this page Aug 19, 2024 · 3 revisions

Welcome to the DiffDomain wiki! There is a summary of usage.

Usage

Main method

Usage:

  •  python diffdomains.py dvsd one <chr> <start> <end> <hic0> <hic1> [options]  
    
  •  python diffdomains.py dvsd multiple <hic0> <hic1> <tadlist_of_hic0.bed> [options]  
    
  •  python diffdomains.py visualization <chr> <start> <end> <hic0> <hic1> [options]  
    
  •  python diffdomains.py adjustment <method> <output_of_dvsd_multiple> <output_file_name> [options] 
    

Options:

  • --reso resolution for hicfile [default: 100000]
  • --min_nbin effective number of bin [default: 10]
  • --f filtering parameter for filtering the null values of the matrix[0~1) [default: 0.5]

For example, when setting ‘--f 0.6’, in the contact matrix of a TAD, if the number of the columns, whose proportions of missing values is lower than 40%, is smaller than min_nbin, DiffDomain will skip comparing this TAD anymore and set its result (statistics, the 5th column ; P value, the 6th column) as NAN.
In other words, DiffDomain compares the TAD contact matrixes with no less than min_nbin columns, whose missing values are less than (1-f)*100%.

  • --ofile : the filepath for output file [default: stdout]
  • --oprefix : prefix for output files
  • --oprefixFig : prefix for output figures
  • --sep : deliminator for hicfile [default: \t]
  • --hicnorm : hic matrix normalization method [default: KR]
  • --chrn : chromosome number [default: ALL]
  • --ncore : the number of parallel process [default: 10]
  • --filter : wheather to filter out unreorganized TADs after adjustment [default: False]

Note:

  1. for most of the bulk Hi-C data, such as hic data in Adiden [Reference], results is not sensitive to the exact value of --f.
  2. For single-cell Hi-C data, recommend users try multiple values of --f and choose one with acceptable number of TADs compared. Due to high sparisity in single-cell Hi-C data and variation in imputation methods (such as scHiCluster, Higashi, scVI-3D), we did not set a default value of --f.

Classification

Usage:

  •  python diffdomain-py3/classification.py -d <result_of_diffdomains.py_adjustment> -t <tadlist_of_hic2> [options]  
    

options

  • --limit : the length of bases, within which the boundaries will be judged as common boundaries [default: 30000]
  • --out : the filename of output [default: name_of_-d_types.txt] .
  • --kpercent : the common boundareis are within max(l*bin,k% TAD's length) [default: 10] .
  • --remote : the limitation of the biggest region [default: 1000000]
  • --s1 : int, to skip the first s1 rows in -d [default: 0]
  • --s2 : int, to skip the first s2 rows in -t [default: 0]
  • --sep1 : the separator of -d [default: \t]
  • --sep2 : the separator of -t [default: \t]

Note:
You can set the --limit to adjust the 'common boundary'.
As said in paper, we use '3bin' as the filter of common boundaries.
That means if we use the 10kb resolution, we will set --limit as 30000, and if 25kb resolution, --limit will be 75000.

Questions

If you encounter the following question, please don't be too worried.

  • AttributeError: 'function' object has no attribute 'straw' : You can open the __init__.py of straw ( its pathway will be reported in the error, for example "/home/gum/.conda/envs/diffDomain/lib/python2.7/site-packages/straw/__ init_.py" ) and then deleted the sentence “straw = straw_module.straw”

Now, Let's get started with some examples of real data in the next chapters!

Subdivide_Strength_Change

Usage:

  •  python diffdomain-py3/subdivide_strength_change.py -f <result_of_classification.py> -h1 <hic_file_of_Condition1> -h2 <hic_file_of_Condition2> -t1 <tadlist_of_hic1> -t2 <tadlist_of_hic2> [options]
    

options

  • --out : the filename of output [default: subdivide_strength_change.txt] .
  • --reso resolution for hicfile [default: 100000]
  • --sep : the separator for -f,-t1, and -t2 [default: \t]