Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with missing values in AnnData 0.8.0 #87

Closed
jackkamm opened this issue Mar 5, 2023 · 1 comment
Closed

Problem with missing values in AnnData 0.8.0 #87

jackkamm opened this issue Mar 5, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@jackkamm
Copy link
Contributor

jackkamm commented Mar 5, 2023

Followup to #86

For krumsiek11_augmented_v0-8.h5ad in that PR, when reading it in with the default Python reader:

sce <- readH5AD(system.file("extdata", "krumsiek11_augmented_v0-8.h5ad", package = "zellkonverter"))

The following data, which contain missing values, are not properly read in:

  • colData(sce)$dummy_int2
  • colData(sce)$dummy_bool2
  • metadata(sce)$dummy_bool2
  • metadata(sce)$dummy_int2
  • metadata(sce)$dummy_category

For example, here is how one of those columns looks:

> metadata(sce)$dummy_int2

<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

> class(metadata(sce)$dummy_int2)

[1] "pandas.core.arrays.integer.IntegerArray"
[2] "pandas.core.arrays.numeric.NumericArray"
[3] "pandas.core.arrays.masked.BaseMaskedArray"
[4] "pandas.core.arraylike.OpsMixin"
[5] "pandas.core.arrays.base.ExtensionArray"
[6] "python.builtin.object"

So it appears to be a pointer to some python object, rather than an R integer vector as expected.

dummy_bool2 is much the same as dummy_int2 (except it is printed as a <BooleanArray> instead of <IntegerArray>).

dummy_category is a bit different from dummy_bool2 and dummy_int2 -- the reader simply skips over it with this warning:

Warning messages:
1: Conversion failed for the item dummy_category in uns with the following error and has been skipped
Conversion error message: "AttributeError: 'Categorical' object has no attribute 'get_values' "

and metadata(sce)$dummy_category is NULL.

Note that colData(sce)$dummy_num2 is correctly handled however -- there does not seem to be a problem with numeric vectors with missing values, only the factors/ints/logicals.

@jackkamm jackkamm changed the title Problem reading nullable factor/bool/int in AnnData 0.8.0 Problem with missing values in AnnData 0.8.0 Mar 5, 2023
@lazappi lazappi added the bug Something isn't working label Mar 6, 2023
@lazappi
Copy link
Member

lazappi commented Mar 6, 2023

Thanks! This is probably because {reticulate} doesn't do any special conversion of those types and we may need to handle them specially (although I'm not entirely sure why...).

@lazappi lazappi closed this as completed in 928f952 Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants