Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: load SparseDatasets into memory #898

Merged
merged 1 commit into from
May 14, 2024
Merged

fix: load SparseDatasets into memory #898

merged 1 commit into from
May 14, 2024

Conversation

Bento007
Copy link
Contributor

@Bento007 Bento007 commented May 10, 2024

Reason for Change

Anndata has a special class SparseDataset when loading a file to memory to reduce memory consumption. Unfortunately the data we need is not accesible from that class and needs to be loaded into memory before performing spatial validation, otherwise the following error occurs.

Traceback (most recent call last):
  File "/Users/ngloria/code/single-cell-curation/venv/bin/cellxgene-schema", line 8, in <module>
    sys.exit(schema_cli())
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/cli.py", line 47, in schema_validate
    is_valid, _, _ = validate(h5ad_file, add_labels_file, ignore_labels=ignore_labels, verbose=verbose)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/validate.py", line 2003, in validate
    validator.validate_adata(h5ad_path, to_memory=to_memory)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/validate.py", line 1954, in validate_adata
    self._deep_check()
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/validate.py", line 1921, in _deep_check
    self._validate_raw()
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/validate.py", line 1331, in _validate_raw
    if not self._has_valid_raw() and self._get_raw_x_loc() == "raw.X":
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/validate.py", line 1186, in _has_valid_raw
    self._validate_raw_data_with_in_tissue_0(x, is_sparse_matrix)
  File "/Users/ngloria/code/single-cell-curation/venv/lib/python3.10/site-packages/cellxgene_schema/validate.py", line 1236, in _validate_raw_data_with_in_tissue_0
    nonzero_row_indices, _ = x.nonzero()
AttributeError: 'SparseDataset' object has no attribute 'nonzero'

Changes

  • Load SparseDataset into memory before performing spatial validation of X matrix. _to_backed is not being used because it causes a different error further down in the code and eventually lead to loading the sparse matrix into memory any ways.

Testing

  • added a test case that validated a spatial datasets loaded from a file.

Notes for Reviewer

After this is merged a we will need to apply the changes made in 5.0.3 and release 5.0.4. The changes in 5.0.3 must be reversed before releasing 5.1.0.

@Bento007 Bento007 added the 5.1 Next minor CELLxGENE schema version after 5.0 label May 13, 2024
@Bento007 Bento007 merged commit 717c9a9 into main May 14, 2024
6 checks passed
@Bento007 Bento007 deleted the tsmith/sparsedataset branch May 14, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.1 Next minor CELLxGENE schema version after 5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants