Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Memory Optimizations & low_memory Flag #324

Merged
merged 6 commits into from
Oct 21, 2024

Conversation

asistradition
Copy link
Contributor

What does your PR implement? Be specific.

The replace_cooks data was stored as a pandas DataFrame. This is memory inefficient for large data. It has been refactored from a DataFrame to a numpy array.

A new low_memory argument to DeseqDataSet is available. When set to True, large data arrays saved into the AnnData elements .obsm and .layers are deleted if there is no further use for them in the standard deseq workflow. This reduces peak memory consumption dramatically.

pydeseq2/dds.py Outdated Show resolved Hide resolved
pydeseq2/dds.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@BorisMuzellec BorisMuzellec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asistradition, thanks a lot for this PR!

Overall I agree with the changes you propose, I just have a few comments regarding the storage and naming of ._cooks_outlier (cf above).

Just for the sake of curiosity, were you limited in your usage of PyDESeq2 because of memory consumption? If so, I'd be curious to have an idea of your setup and the size of your dataset :)

@asistradition
Copy link
Contributor Author

Yes, this is part of my standard single-cell workflow, and memory limitations are the main bottleneck. 60k x 35k won't be a problem with this PR.

Co-authored-by: Boris Muzellec <[email protected]>
Copy link
Collaborator

@BorisMuzellec BorisMuzellec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@BorisMuzellec BorisMuzellec merged commit 20fb473 into owkin:main Oct 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants