ENH: Memory Optimizations & low_memory Flag #324

asistradition · 2024-10-17T19:30:13Z

What does your PR implement? Be specific.

The replace_cooks data was stored as a pandas DataFrame. This is memory inefficient for large data. It has been refactored from a DataFrame to a numpy array.

A new low_memory argument to DeseqDataSet is available. When set to True, large data arrays saved into the AnnData elements .obsm and .layers are deleted if there is no further use for them in the standard deseq workflow. This reduces peak memory consumption dramatically.

for more information, see https://pre-commit.ci

pydeseq2/dds.py

BorisMuzellec

Hi @asistradition, thanks a lot for this PR!

Overall I agree with the changes you propose, I just have a few comments regarding the storage and naming of ._cooks_outlier (cf above).

Just for the sake of curiosity, were you limited in your usage of PyDESeq2 because of memory consumption? If so, I'd be curious to have an idea of your setup and the size of your dataset :)

for more information, see https://pre-commit.ci

asistradition · 2024-10-18T13:43:55Z

Yes, this is part of my standard single-cell workflow, and memory limitations are the main bottleneck. 60k x 35k won't be a problem with this PR.

Co-authored-by: Boris Muzellec <[email protected]>

BorisMuzellec

Perfect, thanks!

asistradition added 2 commits October 16, 2024 16:37

ENH: Low Memory

4d20418

Modify docstring for low_memory

6784b20

asistradition requested review from BorisMuzellec, maikia and umarteauowkin as code owners October 17, 2024 19:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca1cd62

for more information, see https://pre-commit.ci

BorisMuzellec reviewed Oct 18, 2024

View reviewed changes

pydeseq2/dds.py Outdated Show resolved Hide resolved

BorisMuzellec reviewed Oct 18, 2024

View reviewed changes

pydeseq2/dds.py Outdated Show resolved Hide resolved

BorisMuzellec requested changes Oct 18, 2024

View reviewed changes

asistradition and others added 2 commits October 18, 2024 09:36

Move cooks_outliers mask into .varm

291d0a3

[pre-commit.ci] auto fixes from pre-commit.com hooks

b310d62

for more information, see https://pre-commit.ci

Update pydeseq2/dds.py

40e6563

Co-authored-by: Boris Muzellec <[email protected]>

BorisMuzellec approved these changes Oct 21, 2024

View reviewed changes

BorisMuzellec merged commit 20fb473 into owkin:main Oct 21, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Memory Optimizations & low_memory Flag #324

ENH: Memory Optimizations & low_memory Flag #324

asistradition commented Oct 17, 2024

BorisMuzellec left a comment •

edited

Loading

asistradition commented Oct 18, 2024

BorisMuzellec left a comment

ENH: Memory Optimizations & low_memory Flag #324

ENH: Memory Optimizations & low_memory Flag #324

Conversation

asistradition commented Oct 17, 2024

What does your PR implement? Be specific.

BorisMuzellec left a comment • edited Loading

Choose a reason for hiding this comment

asistradition commented Oct 18, 2024

BorisMuzellec left a comment

Choose a reason for hiding this comment

BorisMuzellec left a comment •

edited

Loading