Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outputting the OT matrix failed #4

Open
cindyway opened this issue Apr 2, 2023 · 2 comments
Open

Outputting the OT matrix failed #4

cindyway opened this issue Apr 2, 2023 · 2 comments

Comments

@cindyway
Copy link

cindyway commented Apr 2, 2023

adata, OT = up.Run(adatas=[spot,rna], adata_cm=adata_cm, save_OT=True)

(1) When using the CPU:
The code produced an error with the following message:
IndexError: index 37385 is out of bounds for dimension 0 with size 1

This error indicates that the index 37385 is invalid for the array being accessed. Further investigation is needed to determine the cause of the error and to fix it.

(2) When using the GPU from Colab, Colab Pro, Linux server and RTX3090 on windows, all reported the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())

This error suggests a problem with the CUDA implementation, possibly due to a mismatch between the version of CUDA being used and the hardware or software environment. It may be necessary to consult the author of the code for assistance in resolving this issue.
(3) The error was initially thought to be related to memory usage, but changing the batch size to 100 or even 10 did not solve the issue. Therefore, the problem may not be related to memory limitations.

Is specific cuda version needed?

@caokai1073
Copy link
Owner

Hi, thanks for pointing out this problem. I've looked into it, and it seems that the issue occurs when there is a significant difference between the two modal quantities. In order to address this, I recommend trying a larger batch size, such as 512 or 1024. I am working on a fix for this bug, and I plan to release an updated version of the software in the near future. Thank you again for your feedback and please let me know if you have any further questions or concerns.

@zk-P
Copy link

zk-P commented May 26, 2023

I've modified the code in vae.py at line 435 from tran_batch[j] = torch.from_numpy(tran[j]).to(device)[idx_query[j]][idx_ref] to tran_batch[j] = torch.from_numpy(tran[j]).to(device)[idx_query[j]][:,idx_ref] and it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants