-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duration of a permutation longer than previous one #42
Comments
Hi Tim, In the 'bad' permutation, the reason the runtime is so much longer is that it takes longer to fit region-specific models in regions with lots of CpGs. The sizes of the regions are depending on the data (e.g. if long stretches exist at a particular smoothing bandwidth, the regions are longer). However, I've not seen this particular situation before (long stretches not seen in data, but seen in only some permutations). In addition to time point and location, are there any other covariates that might explain variation in methylation? What could be happening in the 'bad' permutation is that there is some other latent factor that has some association with methylation. What is the distribution of candidate region size (number of CpGs) like? (e.g. if you set permutations to something very low to avoid having a 'bad' permutation just to get some output of candidate regions, you can check the size distribution) Best, |
Hi Tim, Thanks for following up.
Yes, that's correct. the labels are swapped at random, and there's no restriction on the permutations for this specification. I must admit I'm puzzled. I agree that your metrics seem to show that the smoothing bandwidth seems sensible. So I don't have a reasonable explanation for why this 'bad' permutation exists. If you're willing to share a small subset of your data (for example, just one chromosome, or even smaller if it generates the same result), I'd be happy to dig into it further. Let me know, and I'm happy to provide a dropbox link for upload. Best, |
Dear Keegan
I am analyzing a WGBS dataset with dmrseq. The data is not human or mouse, so I used all the tips you gave here to figure out a sensible analysis (trying different parametersettings on one chromosome, plotting the DMRs, ...). I found a sensible setting and everything was working: dmrseq didn't produce any errors and was running quite fast. However, at a certain permutation, dmrseq produced warnings and took significantly longer to run. I have no idea what is causing this and how I an solve this. Could you offer any advice?
Here are some outputs so you can see the difference in running time.
This is an output of discovering the regions. The data consists of 8 samples, collected at four different locations at two different time points. At the moment, I am testing for a difference in location while adjusting for time point. I encoded both location and time as factor.
This is an output of a good permutation:
This is an output of a bad permutation:
Thanks in advance.
With kind regards
Tim
The text was updated successfully, but these errors were encountered: