ram consumption #161

KlemensFroehlich · 2024-11-13T22:47:06Z

Hi Michael
A collaborator has to analyze 3000+ files ( dda orbitrap so around 700MB - 1.5GB per file )
FragPipe ran out of memory with 1TB on a VM during quant....

I was wondering wherher SAGE might be an option here.... I am confident speed would not be an issue :)

As I understand, SAGE does not write intermediate results, but rather stores everything in memory. Would this cause proble,s with so many files or would you recommend a specific procedure how to deal with this?

I quickly went over the intro and the github issues but did not see anything related to ram requirements for large scale analyses.

Any help would be appreciated!

best Klemens

lazear · 2024-11-14T00:51:16Z

1TB might work if you're not doing anything crazy with semi-enzymatic/HLA/etc... I think the largest I have done in a single search was ~2k files.

Are they wanting to run a label-free quantitative analysis where MBR is needed, or is it possible that the files could be processed in batches? If you don't need MBR, you could search each file individually or in batches (e.g. 96 at a time) and then use Mokapot or percolator to control FDR across the experiment. Sage will handle predicting/aligning RTs (used for rescoring PSMs) & global FDR for you, but there aren't any other substantial differences between searching all files at once and individually - and I imagine that the difference between searching 512 files and all 3000 at once will be very minor and within acceptable error.

Even with MBR, I would suggest assessing whether moderately sized batches are OK (I would imagine very few peptides are lost going from 3000 -> batches of 512, for example)

KlemensFroehlich · 2024-11-14T15:17:43Z

Hi Michael
Thanks for the swift reply and the support. Really means a lot!!

I think if possible they would love to do standard tryptic no crazy modification and MBR quant.

That being said I would propose that they stay with the 1TB setup for now and just try the 3000 with MBR.

If it crashes then all files without MBR
And additionally 512 batches with MBR including some same files in all batches as reference.

Then simply check both the reference files and the nonMBR quantization s whether the 512 batch MBR runs are behaving as expected.

If this is the case then stitch everything together with additional tools for adjusting FDR.

Would you think this would be viable as a strategy? I would report back once we have the results as this might be of interest to others.

Best and thanks again for your time and this cool tool.
Klemens

lazear · 2024-11-15T18:31:52Z

I think it would be viable! If it doesn't work, please reach out and I can try to implement some other kind of solution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ram consumption #161

ram consumption #161

KlemensFroehlich commented Nov 13, 2024

lazear commented Nov 14, 2024

KlemensFroehlich commented Nov 14, 2024

lazear commented Nov 15, 2024

ram consumption #161

ram consumption #161

Comments

KlemensFroehlich commented Nov 13, 2024

lazear commented Nov 14, 2024

KlemensFroehlich commented Nov 14, 2024

lazear commented Nov 15, 2024