Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

Closed
Lilly-May opened this issue Sep 1, 2024 · 1 comment · Fixed by #307
Closed

[BUG] DeseqDataSet.refit() fails with latest anndata release 0.10.9 #306

Lilly-May opened this issue Sep 1, 2024 · 1 comment · Fixed by #307
Labels
bug Something isn't working

Comments

@Lilly-May
Copy link

Describe the bug
With the new anndata 0.10.9 release, anndata no longer allows pandas DataFrames to be stored in varm and throws an error when doing so (see 3rd bullet point of release notes here). When refitting a Deseq model (e.g., by running dds.deseq2 where dds = DeseqDataSet with refit_cooks=True) and no sample can be replaced, a pandas.Series is stored in adata.varm, resulting in a ValueError. I believe this issue can be fixed by storing the data differently, for example, as a binary (0/1) in a numpy array.

This is the line of code causing the failure.

To Reproduce

import numpy as np
import pandas as pd
import anndata as ad
import pertpy as pt

n_obs = 80
n_donors = 20
rng = np.random.default_rng(9)
obs = pd.DataFrame(
    {
        "condition": ["A", "B"] * (n_obs // 2),
        "donor": sum(([f"D{i}"] * n_donors for i in range(n_obs // n_donors)), []),
        "pairing": sum(([str(i), str(i)] for i in range(n_obs // 2)), []),
    },
)
var = pd.DataFrame(index=["gene1", "gene2"])
group1 = rng.negative_binomial(20, 0.1, n_obs // 2)
group2 = rng.negative_binomial(5, 0.5, n_obs // 2)

condition_data = np.empty((n_obs,), dtype=group1.dtype)
condition_data[0::2] = group1
condition_data[1::2] = group2

donor_data = np.empty((n_obs,), dtype=group1.dtype)
donor_data[0:n_donors] = group2[:n_donors]
donor_data[n_donors : (2 * n_donors)] = group1[n_donors:]
donor_data[(2 * n_donors) : (3 * n_donors)] = group2[:n_donors]
donor_data[(3 * n_donors) :] = group1[n_donors:]

X = np.vstack([condition_data, donor_data]).T

adata = ad.AnnData(X=X, obs=obs, var=var)

pt.tl.PyDESeq2.compare_groups(
        adata=adata, column="condition", baseline="A", groups_to_compare="B", paired_by="pairing"
    )

Desktop (please complete the following information):

  • OS: I verified that the bug exists on Ubuntu and MacOS

Error Traceback

test_compare_groups.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../pertpy/tools/_differential_gene_expression/_base.py:513: in compare_groups
    model.fit(**fit_kwargs)
../../../pertpy/tools/_differential_gene_expression/_pydeseq2.py:59: in fit
    dds.deseq2()
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/pydeseq2/dds.py:501: in deseq2
    self.refit()
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/pydeseq2/dds.py:938: in refit
    self._replace_outliers()
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/pydeseq2/dds.py:1128: in _replace_outliers
    self.varm["replaced"] = pd.Series(False, index=self.var_names)
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py:216: in __setitem__
    value = self._validate_value(value, key)
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py:279: in _validate_value
    return super()._validate_value(val, key)
/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/aligned_mapping.py:100: in _validate_value
    return coerce_array(val, name=name, allow_df=self._allow_df)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = gene1    False
gene2    False
dtype: bool

    def coerce_array(
        value: Any,
        *,
        name: str,
        allow_df: bool = False,
        allow_array_like: bool = False,
    ):
        """Coerce arrays stored in layers/X, and aligned arrays ({obs,var}{m,p})."""
        # If value is a scalar and we allow that, return it
        if allow_array_like and np.isscalar(value):
            return value
        # If value is one of the allowed types, return it
        if isinstance(value, ArrayDataStructureType.classes()):
            if isinstance(value, np.matrix):
                msg = f"{name} should not be a np.matrix, use np.ndarray instead."
                warnings.warn(msg, ImplicitModificationWarning)
                value = value.A
            return value
        if isinstance(value, pd.DataFrame):
            if allow_df:
                raise_value_error_if_multiindex_columns(value, name)
            return value if allow_df else ensure_df_homogeneous(value, name)
        # if value is an array-like object, try to convert it
        e = None
        if allow_array_like:
            try:
                # TODO: asarray? asanyarray?
                return np.array(value)
            except (ValueError, TypeError) as _e:
                e = _e
        # if value isn’t the right type or convertible, raise an error
        msg = f"{name} needs to be of one of {join_english(ArrayDataStructureType.qualnames())}, not {type(value)}."
        if e is not None:
            msg += " (Failed to convert it to an array, see above for details.)"
>       raise ValueError(msg) from e
E       ValueError: Varm 'replaced' needs to be of one of np.ndarray, numpy.ma.core.MaskedArray, scipy.sparse.spmatrix, awkward.Array, h5py.Dataset, zarr.Array, zappy.base.ZappyArray, anndata.experimental.[CSC,CSR]Dataset, dask.array.Array, cupy.ndarray, or cupyx.scipy.sparse.spmatrix, not <class 'pandas.core.series.Series'>.

/opt/homebrew/Caskroom/mambaforge/base/envs/py12-pertpy-env/lib/python3.12/site-packages/anndata/_core/storage.py:104: ValueError
@BorisMuzellec
Copy link
Collaborator

Thanks @Lilly-May for reporting this bug!

Hopefully #307 did the trick. FYI it fixes another bug in fit_vst due to MultiIndex columns no longer being supported in .obs.

Let me know if you find any other issues with the new anndata release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants