Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NANs introduced for some genes, no warning, no error [BUG] #291

Closed
dbdimitrov opened this issue Jun 11, 2024 · 2 comments
Closed

NANs introduced for some genes, no warning, no error [BUG] #291

dbdimitrov opened this issue Jun 11, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@dbdimitrov
Copy link

Hi,

Thanks for developing this package and for improving it.

Describe the bug
I noticed a bug where while all other stats are calculated, some p-values are assigned to NaN.

To Reproduce
small adata: https://drive.google.com/file/d/1wwZDcsEVZD0ldBrtP63vqA1zTPpM05Q2/view?usp=drive_link

code

ctdata = sc.read("ctdata.h5ad")
dds = DeseqDataSet(
    adata=ctdata,
    design_factors=condition_key,
    ref_level=[condition_key, 'ctrl'], # set control as reference
    refit_cooks=True,
    quiet=True
)

dds.deseq2()
stat_res = DeseqStats(dds, contrast=[condition_key, 'stim', 'ctrl'], quiet=True)
stat_res.summary()
stat_res.lfc_shrink(coeff='condition_stim_vs_ctrl')
dea_df = stat_res.results_df

pydeseq2 version:

pydeseq2.__version__
'0.4.4'

Expected behavior
Throw a warning? Is it expected to have NaNs here? The counts look like there is some variance, etc.

Desktop (please complete the following information):

  • Python=3.10
  • Ubuntu 20.04

Exception Thrown when using the problematic gene alone
The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.

Let me know if I can provide any further info.

@BorisMuzellec
Copy link
Collaborator

BorisMuzellec commented Jul 1, 2024

Hi @dbdimitrov, do you confirm that condition_key is "condition"?

I also get a NaN pvalue for the AAED1 gene. This is a priori not a bug: pydeseq2 filters out p-values based on Cooks outliers (cf the docs).

You can turn this off by setting cooks_filter=False when initialising a DeseqStats object.

That being said, it seems like the ctdata anndata has floats as counts, whereas ints are expected. Converting counts to ints in the piece of code you provided changes the results:

With floats:
Capture d’écran 2024-07-01 à 16 41 54

Converting to ints:
Capture d’écran 2024-07-01 à 16 42 00

I'll dig a bit more into this but I think that we should raise an error (or at least a warning) in case counts are not ints (and / or maybe cast to int when possible).

@dbdimitrov
Copy link
Author

Hi @BorisMuzellec,

Yes, indeed condition_key was "condition" - sorry.

Thanks a lot for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants