Change location of tmp output #150

joachimwolff · 2019-02-04T15:48:05Z

Hi,

can you change the intermediate location where a cool file is written? In the current version it goes to /tmp, /var/tmp, and /usr/tmp or to whatever is written in system environment variables TMPDIR, TMP, TEMP, see: https://docs.python.org/2/library/tempfile.html#tempfile.tempdir

I think many non-developer users are not having knowledge about this and usually have a small root partition. Quite likely they wonder why the run out of disk space if the location they defined as output location is in /home and there is enough space.

It would be good if the temp file you create is created in the location given by cool_uri.

Best,

Joachim

The text was updated successfully, but these errors were encountered:

joachimwolff · 2019-02-06T18:31:52Z

To add here: The matrix written to /tmp had a temporary size of ~ 900 MB. After merging and copy to the real location it has ~130 MB. I need to do some checks but my first impression is that 0.8.2 is compared to 0.7.11 also slower.

joachimwolff · 2019-02-07T11:24:27Z

I have done some tests. Maybe I have done something wrong when I updated the API call from 0.7.11 to 0.8.2. Therfore my src to write a matrix out in 0.7.11 is:

split_factor = 1
if len(self.matrix.data) > 1e7:
    split_factor = 1e4
    matrix_data_frame = np.array_split(matrix_data_frame, split_factor)

cooler.io.create(cool_uri=pFileName,
                     bins=bins_data_frame,
                     pixels=matrix_data_frame,
                     append=self.appendData,
                     dtype=dtype_pixel)

And in 0.8.2 it is:

if len(self.matrix.data) > 1e7:
       split_factor = 1e4
        matrix_data_frame = np.array_split(matrix_data_frame, split_factor)
if self.appendData:
        self.appendData = 'a'
else:
        self.appendData = 'w'
cooler.create_cooler(cool_uri=pFileName,
                             bins=bins_data_frame,
                             pixels=matrix_data_frame,
                             mode=self.appendData,
                             dtypes=dtype_pixel)

I tested now how long it takes to open a cool file, apply correction factors and write it back. Both with 10000 chuncks on Rao 2014 data, GSE63525_GM12878_insitu_primary.hic which was transformed to a cool file with hic2cool and the 100kb resultion was used.

Runtime with 0.7.11 in total is 6 minutes and 9 seconds.
Runtime with 0.8.2 in total is 41 minutes and 52 seconds (!!!).

Output of /usr/bin/time -v for 0.7.11:

Command being timed: "hicConvertFormat -m GSE63525_GM12878_insitu_primary_100kb.cool --inputFormat cool --outputFormat cool --correction_name KR -o cooler0711_kr.cool"
        User time (seconds): 859.37
        System time (seconds): 104.14
        Percent of CPU this job got: 260%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:09.49
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 20156488
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 27159321
        Voluntary context switches: 2236
        Involuntary context switches: 7900
        Swaps: 0
        File system inputs: 424
        File system outputs: 726736
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

For 0.8.2

Command being timed: "hicConvertFormat -m GSE63525_GM12878_insitu_primary_100kb.cool --inputFormat cool --outputFormat cool --correction_name KR -o cooler0802_kr.cool"
        User time (seconds): 4065.05
        System time (seconds): 219.05
        Percent of CPU this job got: 170%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 41:54.46
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 20212592
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 33
        Minor (reclaiming a frame) page faults: 56410095
        Voluntary context switches: 8105
        Involuntary context switches: 5363406
        Swaps: 0
        File system inputs: 14592
        File system outputs: 8282512
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Both files have a different file size in the end 354 MB vs 345 MB, a check for the content shows they are equal

File:   cooler0711_kr.cool
Size:   30,971
Sum:    2,271,250,650.417947
Bin_length:     100000
Chromosomes:    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT
Non-zero elements:      561,022,262
Minimum (non zero):     0.20625104158977262
Maximum:        320932.0
NaN bins:       2442
INFO:hicmatrix.HiCMatrix:The following columns are available: ['chrom' 'start' 'end' 'weight']

File:   cooler0802_kr.cool
Size:   30,971
Sum:    2,271,250,650.417947
Bin_length:     100000
Chromosomes:    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT
Non-zero elements:      561,022,262
Minimum (non zero):     0.20625104158977262
Maximum:        320932.0
NaN bins:       2442
INFO:hicmatrix.HiCMatrix:The following columns are available: ['chrom' 'start' 'end' 'weight']

Cooler version 0.7.11 writes everything directly to the given location, and as mentioned above, 0.8.2 writes it to tmp first. It creates here two files: One multicooler file with a file size of 3.3 GB and second one (I guess for merging) with 375 MB. That's an overhead of factor ~10 (!!!).

I don't know what causes this massive overhead, but from what I see here I hope you can improve the performance back to the level of the 0.7.11 version.

If I can support or help you somehow to achieve this, please contact me.

Best,

Joachim

nvictus · 2019-02-08T03:21:57Z

Hi Joachim,

Cooler version 0.7.11 writes everything directly to the given location, and as mentioned above, 0.8.2 writes it to tmp first.

The 0.7.11 behavior wasn't removed! If the input chunks are provided in the right order (I believe this is the case for you), pass ordered=True. The file will be created in one pass. Sorry this wasn't more clearly documented.

Otherwise, it is assumed that the chunks can be in any order, so they are written as a series of partial coolers and then merged (two steps). The merge step is going to be very slow if your chunks are small because there will be too many of them -- (and if there are > 200 chunks it will do a 2-pass recursive merge!).

If you can buffer the chunks into much larger ones, then there shouldn't be so much overhead. A simple wrapper generator like this can work. Though, this seems to be tripping up users, so we may have to factor such buffering into the function.

Thanks for the benchmarking. Let me know the timing with ordered=True.

Re: the original issue

I think many non-developer users are not having knowledge about this and usually have a small root partition. Quite likely they wonder why the run out of disk space if the location they defined as output location is in /home and there is enough space.

It would be good if the temp file you create is created in the location given by cool_uri.

Good point. It would probably be a more sensible default. Thoughts /any objections @mimakaev , @golobor ?

Note that, in case you need it, there is a temp_dir option to change the location of the temp folder.

nvictus · 2019-02-08T03:32:44Z

Both files have a different file size in the end 354 MB vs 345 MB, a check for the content shows they are equal

Yes, the difference can be attributed to the addition of the shuffle HDF5 filter, which improves compression a little bit. According to the docs there should be no performance penalty. In any case, it can be turned off via h5opts.

joachimwolff · 2019-02-08T10:50:18Z

Hi,

Thanks for your response, after setting ordered=True it runs in 7 minutes 49 seconds. Still slower than 0.7.11 but I am writing to NFS, quite possible that there is quite some traffic in the network which is responsible for this slowdown.

Best,

Joachim

nvictus · 2019-04-04T22:22:22Z

Temp files are now created in the output location by default.

joachimwolff changed the title ~~Change location of tmp ouput~~ Change location of tmp output Feb 6, 2019

nvictus mentioned this issue Apr 4, 2019

Make default tmpdir local and make rename_chroms refresh #160

Merged

nvictus closed this as completed Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change location of tmp output #150

Change location of tmp output #150

joachimwolff commented Feb 4, 2019

joachimwolff commented Feb 6, 2019

joachimwolff commented Feb 7, 2019

nvictus commented Feb 8, 2019

nvictus commented Feb 8, 2019

joachimwolff commented Feb 8, 2019

nvictus commented Apr 4, 2019

Change location of tmp output #150

Change location of tmp output #150

Comments

joachimwolff commented Feb 4, 2019

joachimwolff commented Feb 6, 2019

joachimwolff commented Feb 7, 2019

nvictus commented Feb 8, 2019

nvictus commented Feb 8, 2019

joachimwolff commented Feb 8, 2019

nvictus commented Apr 4, 2019