You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import scanpy as sc
import anndata
import time
import os,wget
import cudf
import cupy as cp
from cuml.decomposition import PCA
from cuml.manifold import TSNE
from cuml.cluster import KMeans
from cuml.preprocessing import StandardScaler
from matplotlib import pyplot as plt
import warnings
warnings.filterwarnings('ignore', 'Expected ')
warnings.simplefilter('ignore')
import rmm
rmm.reinitialize(
managed_memory=True, # Allows oversubscription
pool_allocator=False, # default is False
devices=0, # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm.rmm_cupy_allocator)
MT_GENE_PREFIX = "MT-" # Prefix for mitochondria genes to regress out
markers = ["ACE2", "TMPRSS2", "EPCAM"] # Marker genes for visualization
# filtering cells
min_genes_per_cell = 1 # Filter out cells with fewer genes than this expressed
max_genes_per_cell = 100000 # Filter out cells with more genes than this expressed
pt_max = 1
# filtering genes
min_cells_per_gene = 1 # Filter out genes expressed in fewer cells than this
n_top_genes = 2000 # Number of highly variable genes to retain
# PCA
n_components = 50 # Number of principal components to compute
# t-SNE
tsne_n_pcs = 20 # Number of principal components to use for t-SNE
# KNN
n_neighbors = 15 # Number of nearest neighbors for KNN graph
knn_n_pcs = 50 # Number of principal components to use for finding nearest neighbors
# UMAP
umap_min_dist = 0.3
umap_spread = 1.0
# Gene ranking
ranking_n_top_genes = 50
adata = sc.read('/rapids_clara/c952.diff_PRO.h5ad')
genes = cudf.Series(adata.var_names)
sparse_gpu_array=cp.sparse.csr_matrix(adata.raw.X)
sparse_gpu_array[1350000:1360000].get()
When I use .get() for the csr_matrix(sparse_gpu_array), an error shown as follows:
ValueError Traceback (most recent call last)
Input In [94], in <cell line: 1>()
----> 1 sparse_gpu_array[1350000:1360000].get()
File /opt/conda/envs/rapids/lib/python3.9/site-packages/cupyx/scipy/sparse/csr.py:73, in csr_matrix.get(self, stream)
71 indices = self.indices.get(stream)
72 indptr = self.indptr.get(stream)
---> 73 return scipy.sparse.csr_matrix(
74 (data, indices, indptr), shape=self._shape)
File /opt/conda/envs/rapids/lib/python3.9/site-packages/scipy/sparse/_compressed.py:106, in _cs_matrix.__init__(self, arg1, shape, dtype, copy)
103 if dtype is not None:
104 self.data = self.data.astype(dtype, copy=False)
--> 106 self.check_format(full_check=False)
File /opt/conda/envs/rapids/lib/python3.9/site-packages/scipy/sparse/_compressed.py:178, in _cs_matrix.check_format(self, full_check)
176 raise ValueError("indices and data should have the same size")
177 if (self.indptr[-1] > len(self.indices)):
--> 178 raise ValueError("Last value of index pointer should be less than "
179 "the size of index and data arrays")
181 self.prune()
183 if full_check:
184 # check format validity (more expensive)
ValueError: Last value of index pointer should be less than the size of index and data arrays
However, if I set the interval with [1000:1360000], it will not feedback with any error. The original h5ad file is about 20GiB. And the shape for sparse_gpu_array is (1462703, 27610). Why it will show error for this special interval?
The text was updated successfully, but these errors were encountered:
@linmuchuiyang thanks for opening an issue on this. Is there any chance that dataset is publicly available? If not, have you been able to reproduce this behavior on a dataset that is publicly available, or with generated data?
This behavior does seem pretty weird, but it's not the first time I've seen strange behavior like this. Does it also fail if you select a few entries closer to the beginning of the matrix? Something like sparse_gpu_array[:10]?
However, if I set the interval with [1000:1360000], it will not feedback with any error.
When it doesn’t feedback with any error, are you saying it succeeds or that it seems to fail silently without any error?
In the meantime, I’ll try loading up a couple of the datasets we use in the examples and see if I can reproduce the behavior.
I encounter this error too, when matrix more than one million cells and 20k genes. I think this error is produce by vstack when chunks merge and there are produced negative indptr in CSR/CSC matrix, maybe the matrix is over limitation of CSR matrix or CSC matrix (Should < 2^32 elements in matrix?). But I don't have any idea how to fix it.
When I use .get() for the csr_matrix(sparse_gpu_array), an error shown as follows:
However, if I set the interval with [1000:1360000], it will not feedback with any error. The original h5ad file is about 20GiB. And the shape for sparse_gpu_array is (1462703, 27610). Why it will show error for this special interval?
The text was updated successfully, but these errors were encountered: