[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

Lilly-May · 2024-09-01T10:15:14Z

Describe the bug
With the new anndata 0.10.9 release, anndata no longer allows pandas DataFrames to be stored in varm and throws an error when doing so (see 3rd bullet point of release notes here). When refitting a Deseq model (e.g., by running dds.deseq2 where dds = DeseqDataSet with refit_cooks=True) and no sample can be replaced, a pandas.Series is stored in adata.varm, resulting in a ValueError. I believe this issue can be fixed by storing the data differently, for example, as a binary (0/1) in a numpy array.

This is the line of code causing the failure.

To Reproduce

import numpy as np
import pandas as pd
import anndata as ad
import pertpy as pt

n_obs = 80
n_donors = 20
rng = np.random.default_rng(9)
obs = pd.DataFrame(
    {
        "condition": ["A", "B"] * (n_obs // 2),
        "donor": sum(([f"D{i}"] * n_donors for i in range(n_obs // n_donors)), []),
        "pairing": sum(([str(i), str(i)] for i in range(n_obs // 2)), []),
    },
)
var = pd.DataFrame(index=["gene1", "gene2"])
group1 = rng.negative_binomial(20, 0.1, n_obs // 2)
group2 = rng.negative_binomial(5, 0.5, n_obs // 2)

condition_data = np.empty((n_obs,), dtype=group1.dtype)
condition_data[0::2] = group1
condition_data[1::2] = group2

donor_data = np.empty((n_obs,), dtype=group1.dtype)
donor_data[0:n_donors] = group2[:n_donors]
donor_data[n_donors : (2 * n_donors)] = group1[n_donors:]
donor_data[(2 * n_donors) : (3 * n_donors)] = group2[:n_donors]
donor_data[(3 * n_donors) :] = group1[n_donors:]

X = np.vstack([condition_data, donor_data]).T

adata = ad.AnnData(X=X, obs=obs, var=var)

pt.tl.PyDESeq2.compare_groups(
        adata=adata, column="condition", baseline="A", groups_to_compare="B", paired_by="pairing"
    )

Desktop (please complete the following information):

OS: I verified that the bug exists on Ubuntu and MacOS

Error Traceback

test_compare_groups.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../pertpy/tools/_differential_gene_expression/_base.py:513: in compare_groups
    model.fit(**fit_kwargs)
../../../pertpy/tools/_differential_gene_expression/_pydeseq2.py:59: in fit
    dds.deseq2()
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/pydeseq2/dds.py:501: in deseq2
    self.refit()
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/pydeseq2/dds.py:938: in refit
    self._replace_outliers()
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/pydeseq2/dds.py:1128: in _replace_outliers
    self.varm["replaced"] = pd.Series(False, index=self.var_names)
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py:216: in __setitem__
    value = self._validate_value(value, key)
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py:279: in _validate_value
    return super()._validate_value(val, key)
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py:100: in _validate_value
    return coerce_array(val, name=name, allow_df=self._allow_df)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = gene1    False
gene2    False
dtype: bool

    def coerce_array(
        value: Any,
        *,
        name: str,
        allow_df: bool = False,
        allow_array_like: bool = False,
    ):
        """Coerce arrays stored in layers/X, and aligned arrays ({obs,var}{m,p})."""
        # If value is a scalar and we allow that, return it
        if allow_array_like and np.isscalar(value):
            return value
        # If value is one of the allowed types, return it
        if isinstance(value, ArrayDataStructureType.classes()):
            if isinstance(value, np.matrix):
                msg = f"{name} should not be a np.matrix, use np.ndarray instead."
                warnings.warn(msg, ImplicitModificationWarning)
                value = value.A
            return value
        if isinstance(value, pd.DataFrame):
            if allow_df:
                raise_value_error_if_multiindex_columns(value, name)
            return value if allow_df else ensure_df_homogeneous(value, name)
        # if value is an array-like object, try to convert it
        e = None
        if allow_array_like:
            try:
                # TODO: asarray? asanyarray?
                return np.array(value)
            except (ValueError, TypeError) as _e:
                e = _e
        # if value isn’t the right type or convertible, raise an error
        msg = f"{name} needs to be of one of {join_english(ArrayDataStructureType.qualnames())}, not {type(value)}."
        if e is not None:
            msg += " (Failed to convert it to an array, see above for details.)"
>       raise ValueError(msg) from e
E       ValueError: Varm 'replaced' needs to be of one of np.ndarray, numpy.ma.core.MaskedArray, scipy.sparse.spmatrix, awkward.Array, h5py.Dataset, zarr.Array, zappy.base.ZappyArray, anndata.experimental.[CSC,CSR]Dataset, dask.array.Array, cupy.ndarray, or cupyx.scipy.sparse.spmatrix, not <class 'pandas.core.series.Series'>.

/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/storage.py:104: ValueError

The text was updated successfully, but these errors were encountered:

BorisMuzellec · 2024-09-04T07:12:35Z

Thanks @Lilly-May for reporting this bug!

Hopefully #307 did the trick. FYI it fixes another bug in fit_vst due to MultiIndex columns no longer being supported in .obs.

Let me know if you find any other issues with the new anndata release!

Lilly-May added the bug Something isn't working label Sep 1, 2024

Lilly-May mentioned this issue Sep 1, 2024

Fix docs rendering for classes using lazy import scverse/pertpy#651

Merged

Zethson mentioned this issue Sep 1, 2024

Remove anndata pin after pydeseq2 bug fix scverse/pertpy#652

Closed

BorisMuzellec mentioned this issue Sep 3, 2024

BUG set varm["replaced"] as a numpy array #307

Merged

BorisMuzellec closed this as completed in #307 Sep 4, 2024

Lilly-May mentioned this issue Sep 4, 2024

Remove anndata pin scverse/pertpy#653

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

Lilly-May commented Sep 1, 2024

BorisMuzellec commented Sep 4, 2024

[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

Comments

Lilly-May commented Sep 1, 2024

BorisMuzellec commented Sep 4, 2024