Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcm2niix on a cluster #104

Closed
pspec opened this issue Jun 8, 2017 · 5 comments
Closed

dcm2niix on a cluster #104

pspec opened this issue Jun 8, 2017 · 5 comments

Comments

@pspec
Copy link

pspec commented Jun 8, 2017

Hello, any advice for running dcm2niix on an HPC cluster? I'm finding it to be drastically slower on the cluster than running locally. I'm trying to convert fMRI multi-band imaging datasets acquired on a Phillips machine. Each run of fMRI is approx. 20,000 dicoms.

I typically request 1 node with 16gb of physical memory when submitting dcm2niix jobs to the cluster, and it takes about 20-30 minutes for a set of 20,000 dicoms. Running it locally takes about 2 minutes for the same set of dicoms. I have tried both internal compression and pigz, but still takes very long regardless.

Thanks for the help!

@ningfei
Copy link
Collaborator

ningfei commented Jun 8, 2017

Hi, maybe I'm wrong, butI guess the performance was limited by I/O. The cluster is generally used for tasks of high computational loads. It might not do the same well in the case of high I/O tasks. You may observe similar result if you try to decompress an archive with a lot of small files in it on a cluster. It will probably be slower than on your local computer, on which the I/O resource is exclusively for your use.

@neurolabusc
Copy link
Collaborator

neurolabusc commented Jun 8, 2017

Short Answer:
dcm2niix is likely disk i/o speed limited. This reflects an issue of your cluster and not the software.

Long Answer:
dcm2niix is generally faster than alternatives, and in most situations is limited by disk I/O. There are a couple of general tips you can try.
1.) Compressing files with gz can be slow. Saving uncompressed images ("-z n") is much faster. If you do want to compress images, the parallel pigz ("-z y") is generally your fastest solution. However, you are correct that in cases with slow network access the serial internal compressor ("-z i") can be better. With either method, you can also request a rapid though less efficient compression with the (e.g. "-1", "-2") parameter (whereas the default good but slow compression is "-6")
2.) This does not apply to this specific question (as the user reports a big difference between cluster and local performance), but I am including it for completeness. My software is very slow to decode images stored in the arcane, obsolete, inefficient, DICOM-specific JPEG-lossless transfer syntaxes (1.2.840.10008.1.2.4.57, 1.2.840.10008.1.2.4.70). For all the reasons, as well as the fact that some vendors incorrectly implemented this format I strongly suggest you do not use this format to store your data. If you have a lot of data that uses this format, you may want to invest in developing a faster decoder: the current decoder (jpg_0XC3.cpp) is designed for robustness not speed.
3.) Are you using dcm2niix directly or are you calling it from another tool such as heudiconv? If you are doing the latter, see if you can accelerate things by using dcm2niix directly. While heudiconv is a terrific tool, at the moment it relies on the very useful but slow dcmstack which can be slow.
4.) Make sure you process your data on the smallest unit of data possible. For example if you have the folders ~/dcm/subj1, ~/dcm/subj2, ~/dcm/subj3, you will find that processing each folder sequentially is faster than processing the root folder ~/dcm (though I believe the penalty for dcm2niix is far smaller than other tools like dcm2nii). You could also see if you could process these three folders in parallel through asynchronous calls to dcm2niix, though this would depend on the network bandwidth and disk-contention issues of your hardware.
5.) One thing you could explore is creating a RAM disk: transfer your images to the RAM disk and have dcm2niix use the RAM disk as both the input and output folder. You should find that in this case dcm2niix is faster on the network than run locally on a local SSD. However, the issue with this is you now have to spend time transferring the DICOM files from the network to the RAM disk and subsequently copying the results back to the network. In my experience, dcm2niix is so efficient that these extra steps make the RAM disk impractical. However, I should note that a major benefit of this approach is that you can run "-z y" without the normal network penalty.
6.) Feel free to share any tricks you discover. I do think these will be specific on your network and hardware, but your solutions may help others.

@neurolabusc
Copy link
Collaborator

One other thought: it might be worth talking to your acquisition team about adjusting the DICOM output and the vendor about streamlining their implementation of DICOM. From the user's perspective it is really convenient Phillips allows an fMRI data to be saved as a single 4D file, or as thousands of separate images. You may want to see if selecting between these two options makes a difference to your conversion times. It is possible that dcm2niix can be improved to read the 4D files more efficiently. However, it is certainly the case that from the developers perspective the current Philips 4D implementation is extremely laborious to decode, and streamlining this at the source could make everyone's life easier.

@pspec
Copy link
Author

pspec commented Jun 9, 2017

Thank you so much for all the help! I will discuss these ideas with our cluster and MR acquisition team, and post any solutions we come up with. Much appreciated!

@neurolabusc
Copy link
Collaborator

@pspec: I suggest you download and build the latest version (13-June-2017). One challenge with DICOM files is that we do not know the size of the header until we have parsed it. In the past, I simply loaded the entire file to RAM. The new version will load your DICOM file in 1Mb segments. This is faster for huge 4D Philips datasets (e.g. a 270 Mb DICOM file has a ~8mb header), in particular when using systems with slow disk access (e.g. clusters). Note that we still need to eventually load the whole image and write it to disk. In practice, I think this new version ends up being around 15% faster. The value "MaxBufferSz" in nii_dicom.cpp controls the size of the cache with a 1Mb default. You re-instate the old behavior compile with the "-dmyLoadWholeFileToReadHeader" directive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants