-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Memory Optimizations & low_memory Flag #324
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @asistradition, thanks a lot for this PR!
Overall I agree with the changes you propose, I just have a few comments regarding the storage and naming of ._cooks_outlier
(cf above).
Just for the sake of curiosity, were you limited in your usage of PyDESeq2 because of memory consumption? If so, I'd be curious to have an idea of your setup and the size of your dataset :)
Yes, this is part of my standard single-cell workflow, and memory limitations are the main bottleneck. 60k x 35k won't be a problem with this PR. |
Co-authored-by: Boris Muzellec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks!
What does your PR implement? Be specific.
The
replace_cooks
data was stored as a pandas DataFrame. This is memory inefficient for large data. It has been refactored from a DataFrame to a numpy array.A new
low_memory
argument toDeseqDataSet
is available. When set to True, large data arrays saved into the AnnData elements.obsm
and.layers
are deleted if there is no further use for them in the standard deseq workflow. This reduces peak memory consumption dramatically.