Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] rsc.tl.umap returns RAFT failure #298

Open
johnsCheng opened this issue Nov 21, 2024 · 6 comments
Open

[QST] rsc.tl.umap returns RAFT failure #298

johnsCheng opened this issue Nov 21, 2024 · 6 comments
Labels
question Further information is requested

Comments

@johnsCheng
Copy link

What is your question?
I can not run rsc.tl.umap, though I can run all steps before running UMAP dimension reduction.
I made the installation of RSC using Conda create rapids-24.10 and pip install rapids-singlecell (version 0.10.10).
I'm not sure whether the error came from RSC or RAPIDS.
Here is the error report:
`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[104], line 1
----> 1 rsc.tl.umap(fibro, min_dist=0.3) #min_dist float (default: 0.5)

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/rapids_singlecell/tools/_umap.py:182, in umap(adata, min_dist, spread, n_components, maxiter, alpha, negative_sample_rate, init_pos, random_state, a, b, key_added, neighbors_key, copy)
163 umap = UMAP(
164 n_neighbors=n_neighbors,
165 n_components=n_components,
(...)
178 precomputed_knn=pre_knn,
179 )
181 key_obsm, key_uns = ("X_umap", "umap") if key_added is None else [key_added] * 2
--> 182 adata.obsm[key_obsm] = umap.fit_transform(X)
184 adata.uns[key_uns] = {"params": stored_params}
185 return adata if copy else None

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function..decorator_function..decorator_closure..wrapper(*args, **kwargs)
185 set_api_output_dtype(output_dtype)
187 if process_return:
--> 188 ret = func(*args, **kwargs)
189 else:
190 return func(*args, **kwargs)

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop..dispatch(self, *args, **kwargs)
391 if hasattr(self, "dispatch_func"):
392 func_name = gpu_func.name
--> 393 return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
394 else:
395 return gpu_func(self, *args, **kwargs)

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function..decorator_function..decorator_closure..wrapper(*args, **kwargs)
188 ret = func(*args, **kwargs)
189 else:
--> 190 return func(*args, **kwargs)
192 return cm.process_return(ret)

File base.pyx:687, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:741, in cuml.manifold.umap.UMAP.fit_transform()

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function..decorator_function..decorator_closure..wrapper(*args, **kwargs)
185 set_api_output_dtype(output_dtype)
187 if process_return:
--> 188 ret = func(*args, **kwargs)
189 else:
190 return func(*args, **kwargs)

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop..dispatch(self, *args, **kwargs)
391 if hasattr(self, "dispatch_func"):
392 func_name = gpu_func.name
--> 393 return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
394 else:
395 return gpu_func(self, *args, **kwargs)

File ~/.conda/envs/rapids-24.10/lib/python3.10/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function..decorator_function..decorator_closure..wrapper(*args, **kwargs)
188 ret = func(*args, **kwargs)
189 else:
--> 190 return func(*args, **kwargs)
192 return cm.process_return(ret)

File base.pyx:687, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:678, in cuml.manifold.umap.UMAP.fit()

RuntimeError: RAFT failure at file=~/.conda/envs/rapids-24.10/include/raft/spectral/detail/lapack.hpp line=490: `

@johnsCheng johnsCheng added the question Further information is requested label Nov 21, 2024
@johnsCheng johnsCheng changed the title [QST] [QST] rsc.tl.umap returns RAFT failure Nov 21, 2024
@Intron7
Copy link
Member

Intron7 commented Nov 21, 2024

@johnsCheng

I tried to reproduce you issue but I cant do it. Do you have a minimal reproducer. I also think there might be an issue with your installation. Can you please confirm all versions of rapids and rapids-singlecell. Also please try rapids-singlecell== 0.10.11

@johnsCheng
Copy link
Author

Thanks for your advice! I made the installation of the new version rapids-singlecell (0.10.11) now

$conda list rapids
# packages in environment at /share/home/jinghuic/software/miniconda3/envs/rapids_singlecell:
#
# Name                    Version                   Build  Channel
rapids                    24.10.00        cuda12_py311_241009_g19a0c5a_0    rapidsai
rapids-dask-dependency    24.10.00                   py_0    rapidsai
rapids-singlecell         0.10.11                  pypi_0    pypi
rapids-xgboost            24.10.00        cuda12_py311_241009_g19a0c5a_0    rapidsai

But here I'm stuck with the first step rsc.get.anndata_to_GPU
which returns the error:

rapids_singlecell version is 0.10.11
(159682, 27157)
0.0 8.974227131854372
(159682, 27157)
0.0 8.974227131854372
0
<class 'anndata._core.views.SparseCSRMatrixView'>
TypeError: float() argument must be a string or a real number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/share/home/jinghuic/script/read.py", line 38, in <module>
    rsc.get.anndata_to_GPU(batch)
  File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/rapids_singlecell/get/_anndata.py", line 63, in anndata_to_GPU
    _set_obs_rep(adata, X, layer=layer)
  File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/scanpy/get/get.py", line 471, in _set_obs_rep
    adata.X = val
    ^^^^^^^
  File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/anndata/_core/anndata.py", line 650, in X
    self._adata_ref._X[oidx, vidx] = value
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/scipy/sparse/_csr.py", line 41, in __setitem__
    return super().__setitem__(key, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/scipy/sparse/_index.py", line 145, in __setitem__
    x = np.asarray(x, dtype=self.dtype)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**ValueError: setting an array element with a sequence.**

I have no idea, and following is my script:

  1 import scanpy as sc
  2 import rapids_singlecell as rsc
  3 print('rapids_singlecell version is',rsc.__version__)
  4 import numpy as np
  5 read_file='python/Wang_NatCancer.h5ad'
  6 mydata=sc.read(read_file)
  7 print(mydata.X.shape)
  8 print(np.min(mydata.X),np.max(mydata.X))
  9 import cupy as cp
 10 
 11 import time
 12 
 13 import warnings
 14 
 15 warnings.filterwarnings("ignore")
 16 import rmm
 17 from rmm.allocators.cupy import rmm_cupy_allocator
 18 
 19 rmm.reinitialize(
 20     managed_memory=True,  # Allows oversubscription
 21     pool_allocator=False,  # default is False
 22     devices=3,  # GPU device IDs to register. By default registers only GPU 0.
 23 )
 24 cp.cuda.set_allocator(rmm_cupy_allocator)
 25 import anndata as ad
 26 print(mydata.X.shape)
 27 print(np.min(mydata.X),np.max(mydata.X))
 28 import gc
 29 import scipy.sparse as sp
 30 # Handle memory error, e.g., by reducing data size or using CPU as fallback
 31 # Example: Process data in smaller batches
 32 batch_size = 1000  # Adjust batch size as needed
 33 for i in range(0, len(mydata), batch_size):
 34     batch = mydata[i:i+batch_size]
 35     print(i)
 36     try:
 37         print(type(batch.X))
 38         rsc.get.anndata_to_GPU(batch)
 39     except MemoryError as e:
 40         print("MemoryError SMALL in batch "+str(i) , e)
 41         # Handle batch-specific memory error
 42     finally:
 43         del batch
 44         gc.collect()
 45 
 46 # Ensure GPU memory is freed up
 47 cp.get_default_memory_pool().free_all_blocks()
 48 
 49 
 50 #rsc.get.anndata_to_GPU(mydata)

@johnsCheng
Copy link
Author

@Intron7
PS, I also check the related issue in #261 , I have no error from the installation.

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

$nvidia-smi
Wed Nov 27 16:08:54 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L40                     On  | 00000000:17:00.0 Off |                    0 |
| N/A   31C    P8              34W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L40                     On  | 00000000:65:00.0 Off |                    0 |
| N/A   30C    P8              37W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA L40                     On  | 00000000:CA:00.0 Off |                    0 |
| N/A   31C    P8              34W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA L40                     On  | 00000000:E3:00.0 Off |                    0 |
| N/A   32C    P8              33W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

And the errors TypeError are not the same, which in my case it seems like a mathematical function error **ValueError: setting an array element with a sequence.**

Any advice would be helpful, please.

@Intron7
Copy link
Member

Intron7 commented Nov 27, 2024

can you check what is the dtype from X.data?

@johnsCheng
Copy link
Author

Yeah, I have checked in line 37

 20 print(mydata.X.shape)
 21 print(np.min(mydata.X),np.max(mydata.X))
 22 print(mydata.X.dtype)
 23 import gc
 24 import scipy.sparse as sp
 25 # Handle memory error, e.g., by reducing data size or using CPU as fallback
 26 # Example: Process data in smaller batches
 27 batch_size = 1000  # Adjust batch size as needed
 28 for i in range(0, len(mydata), batch_size):
 29     batch = mydata[i:i+batch_size]
 30     print(i)
 31     try:
 32         print(type(batch.X))
 33         print(batch.X.dtype)
 34         rsc.get.anndata_to_GPU(batch)

It's the <class 'anndata._core.views.SparseCSRMatrixView'>, as same as the normal AnnData.

rapids_singlecell version is 0.10.11
(159682, 27157)
0.0 8.974227131854372
float64
0
<class 'anndata._core.views.SparseCSRMatrixView'>
float64

TypeError: float() argument must be a string or a real number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/share/home/jinghuic/script/read.py", line 38, in
rsc.get.anndata_to_GPU(batch)
File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/rapids_singlecell/get/_anndata.py", line 63, in anndata_to_GPU
_set_obs_rep(adata, X, layer=layer)
File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/scanpy/get/get.py", line 471, in _set_obs_rep
adata.X = val
^^^^^^^
File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/anndata/_core/anndata.py", line 650, in X
self._adata_ref._X[oidx, vidx] = value
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/scipy/sparse/_csr.py", line 41, in setitem
return super().setitem(key, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/share/home/jinghuic/software/miniconda3/envs/rapids_singlecell/lib/python3.11/site-packages/scipy/sparse/_index.py", line 145, in setitem
x = np.asarray(x, dtype=self.dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence.

@Donjae-Wang
Copy link

Donjae-Wang commented Dec 15, 2024

same error when using rsc.get.anndata_to_GPU

TypeError: float() argument must be a string or a real number, not 'csr_matrix'

However, when I save the adata to an .h5ad file and then reload it, everything works fine. I’m not sure what happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants