Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ram consumption #161

Open
KlemensFroehlich opened this issue Nov 13, 2024 · 3 comments
Open

ram consumption #161

KlemensFroehlich opened this issue Nov 13, 2024 · 3 comments

Comments

@KlemensFroehlich
Copy link

Hi Michael
A collaborator has to analyze 3000+ files ( dda orbitrap so around 700MB - 1.5GB per file )
FragPipe ran out of memory with 1TB on a VM during quant....

I was wondering wherher SAGE might be an option here.... I am confident speed would not be an issue :)

As I understand, SAGE does not write intermediate results, but rather stores everything in memory. Would this cause proble,s with so many files or would you recommend a specific procedure how to deal with this?

I quickly went over the intro and the github issues but did not see anything related to ram requirements for large scale analyses.

Any help would be appreciated!

best Klemens

@lazear
Copy link
Owner

lazear commented Nov 14, 2024

1TB might work if you're not doing anything crazy with semi-enzymatic/HLA/etc... I think the largest I have done in a single search was ~2k files.

Are they wanting to run a label-free quantitative analysis where MBR is needed, or is it possible that the files could be processed in batches? If you don't need MBR, you could search each file individually or in batches (e.g. 96 at a time) and then use Mokapot or percolator to control FDR across the experiment. Sage will handle predicting/aligning RTs (used for rescoring PSMs) & global FDR for you, but there aren't any other substantial differences between searching all files at once and individually - and I imagine that the difference between searching 512 files and all 3000 at once will be very minor and within acceptable error.

Even with MBR, I would suggest assessing whether moderately sized batches are OK (I would imagine very few peptides are lost going from 3000 -> batches of 512, for example)

@KlemensFroehlich
Copy link
Author

Hi Michael
Thanks for the swift reply and the support. Really means a lot!!

I think if possible they would love to do standard tryptic no crazy modification and MBR quant.

That being said I would propose that they stay with the 1TB setup for now and just try the 3000 with MBR.

If it crashes then all files without MBR
And additionally 512 batches with MBR including some same files in all batches as reference.

Then simply check both the reference files and the nonMBR quantization s whether the 512 batch MBR runs are behaving as expected.

If this is the case then stitch everything together with additional tools for adjusting FDR.

Would you think this would be viable as a strategy? I would report back once we have the results as this might be of interest to others.

Best and thanks again for your time and this cool tool.
Klemens

@lazear
Copy link
Owner

lazear commented Nov 15, 2024

I think it would be viable! If it doesn't work, please reach out and I can try to implement some other kind of solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants