Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on using IDSL_MINT with cuda device #1

Closed
sara-hashemi opened this issue Jun 24, 2024 · 15 comments
Closed

Issue on using IDSL_MINT with cuda device #1

sara-hashemi opened this issue Jun 24, 2024 · 15 comments
Assignees

Comments

@sara-hashemi
Copy link

I have downloaded and ran your code from GitHub in various environments (Colab, GC, AWS), however, when we change the yaml file to use the GPU server it throws an error specifying problems with the triton library. This is the error:

ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)

Please note, that I have made sure that we are using a GPU server (already checked and confirmed with using nvidia-smi command) and have modified the related YAML file that the device is “cuda”. I have also checked the training file to ensure everything is passed on to the specified device.

Do you have any insights on why this issue comes up? I have tested the code on the CPU and modified the YAML file to work with a CPU and that works fine. The issue only arises when using a GPU server and specifying the device as cuda.

@barupal barupal transferred this issue from idslme/IDSL.IPA Jun 24, 2024
@barupal
Copy link
Member

barupal commented Jun 24, 2024

Hi @NTuan-Nguyen, can you please help Sara to fix this issue ? Thanks! Dinesh

@NTuan-Nguyen
Copy link
Member

Hello Sara,

I think this issue might sometime occur on a system with PyTorch and Triton backend on multiple GPU systems. I was able to replicate the issue on Colab using the default Colab PyTorch 2.3 with cuda 12. A workaround for this issue is to revert back to an earlier PyTorch build using cuda 11.8. This can be done by adding the following code to the notebook during installation step:

!pip install torch==2.0.1+cu118 torchvision torchaudio torchinfo --extra-index-url https://download.pytorch.org/whl/cu118

Example:
image

I have tested this on Colab environment, but please let me know if the issue persist or if you're unable to apply the fix in your workspace.

@NTuan-Nguyen NTuan-Nguyen self-assigned this Jun 24, 2024
@sara-hashemi
Copy link
Author

Thanks, this worked in resolving the issue with the Triton library.

@sara-hashemi
Copy link
Author

sara-hashemi commented Jul 11, 2024 via email

@sajfb
Copy link
Collaborator

sajfb commented Jul 11, 2024

The YAML file has a section for MSP processing criteria used to filter out MSP blocks that fall outside the model training space. If an MSP block does not meet these criteria, it will not be streamlined in the prediction step. You can find a log file in the output folder, which records any issues with MSP block processing. An example of an MSP block for Aspirin is provided on the main GitHub page. The necessary row entries for an MSP block are Name, PrecursorMZ, and Num Peaks.

@sajfb sajfb reopened this Jul 11, 2024
@sara-hashemi
Copy link
Author

sara-hashemi commented Jul 11, 2024 via email

@sajfb
Copy link
Collaborator

sajfb commented Jul 11, 2024

Names do not have to be unique values, but Name row entries must be there. You should standardize your msp blocks before feeding them into MINT.

@sara-hashemi
Copy link
Author

sara-hashemi commented Jul 11, 2024 via email

@sajfb
Copy link
Collaborator

sajfb commented Jul 11, 2024

WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated.

The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file?

@sara-hashemi
Copy link
Author

sara-hashemi commented Jul 12, 2024 via email

@sara-hashemi
Copy link
Author

sara-hashemi commented Jul 12, 2024 via email

@sajfb
Copy link
Collaborator

sajfb commented Jul 12, 2024

Your precursor mass is out of the mass range specified in the YAML file.

Minimum m/z: 100
Maximum m/z: 900

There is a 10% tolerance in number of peaks for fragmentation mass to be outside of the training space, but the precursor mass must be within this range.

Additionally, keep in mind:

Minimum number of peaks: 5 counted after noise removal threshold: 0.01

@sajfb
Copy link
Collaborator

sajfb commented Jul 12, 2024

By this are you referring to the fact that the minimum and maximum m/z should be investigated over the mass of all samples (train/validation and test) then the yaml file can include those numbers? Could this be why the algorithm bypasses some samples? I will also look into the names for a more accurate representation.

On Jul 11, 2024, at 4:38 PM, Sadjad Fakouri Baygi @.***> wrote: WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated. The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJHQJCHGS2YYPNIE2STZL3UMFAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA4TSOBTHE. You are receiving this because you authored the thread.

Yes, each m/z value is represented by a specific embedded token. If that token is not in the training space, the model cannot represent your chemical space.

@sara-hashemi
Copy link
Author

sara-hashemi commented Jul 12, 2024 via email

@sajfb
Copy link
Collaborator

sajfb commented Jul 12, 2024

you're welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants