-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
superstr taking ~6 hours to process a 80GB BAM file #20
Comments
That does seem to be taking a lot longer than I'd expect, although I've encountered some delays on some HPC configurations. It's a bit hard to offer immediate recommendations without knowing a little more about your configuration. One thing that can be done very simply is to increase the -t threshold; this will reduce the amount of reads processed during repeat checking. Are you able to share a bit more about your HPC? Is the data on HDD, SSD? Is there possibly a tape operation slowing things down a bit up front? |
No problem. The thing with superSTR is that it needs to read through the BAM file completely, so the first point of call is to check the performance on the read operation. I've seen some spiky performance on network-attached storage under heavy load, so that's always a possibility. You'd need to I'm less familiar with I think from what I can tell the |
Hello,
Thank you for making this software available!
I downloaded your software a couple months ago and have been trying it out. I have ~8000 WGS BAM files that I would like to process but it is currently taking 6-8 hours to process them with the following code:
Each genome is ~80GB.
I saw that you have some recommendations for parallelisation. However the
xargs
options are not available on the cluster that I use- do you have any recommendations for how to parallelise/speed up the process?Is there a later version of this software that might be faster? I am working on a SLURM HPC.
Thanks again!
The text was updated successfully, but these errors were encountered: