Skip to content
Vince edited this page May 3, 2018 · 4 revisions

The distribution program provides a very coarse count of each value in the bigWig. The values in the bigWig are converted to integers, and in the process the decimal portion is cut off. For this reason, there are obvious caveats, and there are probably better programs for producing this. But if the data are generally above zero, and have a wide range, it's possible to make good use of distribution. The -mult option helps with this limitation a bit, by multiplying values in the bigWig by a constant prior to converting them to integers. In this way, if m = 10, then 1.6 is converted to 16 and 1.8 is converted to 18, and the two numbers would be counted separately than if they had not been multiplied, in which case they would both be converted to 1 and counted together. Usage:

bwtool distribution - produce plot data as the frequency of values
   seen in the bigWig (converted to integers)
usage:
   bwtool distribution input.bw[:chr:start-end] output.txt
options:
     -mult=m      multiply data by a number so the range is altered

Examples

The example from aggregate:

If I want to get the basic count of each value:

$ bwtool dist main.bigWig /dev/stdout
0	1
1	2
2	7
3	5
4	6
5	5
6	5
7	0
8	0
9	0
10	1

(Note that NAs are not counted). I can also just count the data from the particular bed file (again see the aggregate page for specifics):

$ bwtool dist main.bigWig /dev/stdout -regions=agg1.bed 
0	1
1	1
2	2
3	3
4	6
5	2
6	5
7	0
8	0
9	0
10	1

Now suppose it's the same data, but divided by 10:

The distribution is not very informative, using the default behavior of the program:

$ bwtool dist div10.bigWig /dev/stdout
0	31
1	1

Using the -mult option helps in this case, and we can essentially reconstruct the original distribution:

$ bwtool dist div10.bigWig /dev/stdout -mult=10
0	1
1	2
2	7
3	5
4	6
5	5
6	5
7	0
8	0
9	0
10	1
Clone this wiki locally