Issue on using IDSL_MINT with cuda device #1

sara-hashemi · 2024-06-24T18:29:01Z

I have downloaded and ran your code from GitHub in various environments (Colab, GC, AWS), however, when we change the yaml file to use the GPU server it throws an error specifying problems with the triton library. This is the error:

ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)

Please note, that I have made sure that we are using a GPU server (already checked and confirmed with using nvidia-smi command) and have modified the related YAML file that the device is “cuda”. I have also checked the training file to ensure everything is passed on to the specified device.

Do you have any insights on why this issue comes up? I have tested the code on the CPU and modified the YAML file to work with a CPU and that works fine. The issue only arises when using a GPU server and specifying the device as cuda.

barupal · 2024-06-24T19:19:37Z

Hi @NTuan-Nguyen, can you please help Sara to fix this issue ? Thanks! Dinesh

NTuan-Nguyen · 2024-06-24T23:44:00Z

Hello Sara,

I think this issue might sometime occur on a system with PyTorch and Triton backend on multiple GPU systems. I was able to replicate the issue on Colab using the default Colab PyTorch 2.3 with cuda 12. A workaround for this issue is to revert back to an earlier PyTorch build using cuda 11.8. This can be done by adding the following code to the notebook during installation step:

!pip install torch==2.0.1+cu118 torchvision torchaudio torchinfo --extra-index-url https://download.pytorch.org/whl/cu118

Example:

I have tested this on Colab environment, but please let me know if the issue persist or if you're unable to apply the fix in your workspace.

sara-hashemi · 2024-06-25T01:48:17Z

Thanks, this worked in resolving the issue with the Triton library.

sara-hashemi · 2024-07-11T17:12:35Z

Hi NTuan, Thank you for resolving the last issue. I was wondering if you can also assist me with the prediction part of IDSL_MINT, for MS2FP. I am providing the test samples in .msp format, containing 250 in one case and 500 in another case to the prediction method. However, it is only detecting 40-55 samples for different datasets. Could you kindly advise as to why not all samples were provided with a prediction? I have attached a screenshot of the prediction output in Jupyter. As can be seen, the blocks are read perfectly well, but when model prediction is initiated it only provides outputs to a portion of the samples. I look forward to hearing from you. Regards, Sara One sample Another sample:

…

On Jul 11, 2024, at 12:58 PM, NTuan-Nguyen ***@***.***> wrote: Closed #1 <#1> as completed. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSAYJCD6JNBN6BF4KJYV3DZL22UZAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGQ3TINZVGAZTEMI>. You are receiving this because you authored the thread.

sajfb · 2024-07-11T17:53:59Z

The YAML file has a section for MSP processing criteria used to filter out MSP blocks that fall outside the model training space. If an MSP block does not meet these criteria, it will not be streamlined in the prediction step. You can find a log file in the output folder, which records any issues with MSP block processing. An example of an MSP block for Aspirin is provided on the main GitHub page. The necessary row entries for an MSP block are Name, PrecursorMZ, and Num Peaks.

sara-hashemi · 2024-07-11T18:50:47Z

Understood. Am I correct in assuming that in cases that the Names are not unique or we only have access to compound ID or InChIKey (such as the Casmi 2022 dataset), the algorithm would not be able to provide predictions?

…

On Jul 11, 2024, at 1:54 PM, Sadjad Fakouri Baygi ***@***.***> wrote: You should find a log file in the output folder. If the MSP blocks were not processed correctly, it will be recorded there. This is the primary reason why MSP blocks are not streamlined in the prediction step. I've put an example of MSP block for Aspirin in the main Github page. Necessary row entries for a MSP block are Name, PrecursorMZ and Num Peaks. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSAYJCIBXCFLAU6YLRODVDZL3BEZAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGU2DCMJXG4>. You are receiving this because you authored the thread.

sajfb · 2024-07-11T18:58:45Z

Names do not have to be unique values, but Name row entries must be there. You should standardize your msp blocks before feeding them into MINT.

sara-hashemi · 2024-07-11T19:09:57Z

You mentioned the uniqueness of the name isn’t important. I checked the log and msp file. This is just one of the warnings in the log file: WARNING!!! Removed MSP block ID `1` related to `A_M8_negPFP_03`! We have provided the three fields you mentioned in all samples and this is the msp block related to the mentioned removed block: Name: A_M8_negPFP_03 PrecursorMZ: 959.4857 accession: formula: C46H74O18 inchi: inchikey: ZKCHQVRAXCCTLE-YXHZOQBQSA-N instrument: instrument_type: ion_mode: Negative mspfilename: compound22_neg.msp origin: precursor_type: smiles: CC1(C2CCC3(C(C2(CCC1OC4C(C(C(CO4)OC5C(C(C(C(O5)CO)O)O)O)O)OC6C(C(C(C(O6)CO)O)O)O)C)CC=C7C3(CCC8(C7CC(CC8)(C)O)C(=O)O)C)C)C Num Peaks: 20 589.371337890625 1000.0 913.4783935546876 586.0267162402822 71.01252746582031 515.8543538237518 113.02287292480467 468.4553416691734 101.02297973632812 428.1174459461769 457.33154296875 397.6219270407272 85.02803039550781 337.9321735558277 89.02291107177734 274.9899145437956 275.0784912109375 250.24985495915308 161.04464721679688 217.49185721113724 304.286376953125 201.07134509328185 733.4205322265625 180.99497331719678 119.03327178955078 130.82086499261035 73.02812957763672 68.71968997340647 485.32757568359375 68.51101645221802 571.3665161132812 27.78861894280747 377.1240234375 22.856618471034004 199.70016479492188 19.588991572034402 221.0661163330078 15.165348488986465 365.5905151367187 15.03111104927375 As can be seen, the “Name”, “ PrecursorMZ” and “Num Peaks” are all provided. Based on your method I also normalized the peaks so their intensity would be between [10,1000]. Is there anything that we missed leading to the block being omitted from the prediction process?

…

On Jul 11, 2024, at 2:59 PM, Sadjad Fakouri Baygi ***@***.***> wrote: Names do not have to be unique values, but Name row entries must be there. You should standardize your msp blocks before feeding them into MINT. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSAYJGS3RCUJI2VO4PJQKLZL3IXVAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGY3TMOJTGM>. You are receiving this because you authored the thread.

sajfb · 2024-07-11T20:38:04Z

WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated.

The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file?

sara-hashemi · 2024-07-12T00:31:20Z

Sure, please find it as below: MINT_MS2FP_predictor: ## You should try to use identical parameters used in the training step to maximize the performance of the model. MSP: Directory to MSP files: IDSL_MINT_files/msp_files/ MSP files: compound22_neg_sorted.msp # A string OR a list of msp files in [brackets] Minimum m/z: 100 Maximum m/z: 900 Interval m/z: 0.1 # This parameters is also used as a maximum mass deviation parameter Minimum number of peaks: 5 Maximum number of peaks: 512 Noise removal threshold: 0.01 Allowed spectral entropy: True Number of CPU processing threads: 4 Model Parameters: ## Model parameters must be identical to the used parameters in the training step; otherwise, PyTorch cannot load weight parameters. Number of m/z tokens: 8003 # This parameter calculated using: 3 + (Maximum m/z - Minimum m/z)/Interval m/z Dimension of model: 512 # general dimension of the model Embedding norm of m/z tokens: 2 Dropout probability of embedded m/z: 0.1 Number of total fingerprint bits: 2051 # This number should also include three special tokens dedicated to this workflow. (e.g. 2048 + 3) Maximum number of available fingerprint bits: 200 Number of attention heads: 2 Number of encoder layers: 3 Number of decoder layers: 3 Dropout probability of transformer: 0.1 Activation function: relu # relu OR glue Model address to load weights: /home/ec2-user/SageMaker/IDSL_MINT_files/ms2fp_cmp_neg/MINT_MS2FP_model.pth Prediction Parameters: Directory to store predictions: /home/ec2-user/SageMaker/IDSL_MINT_files/ms2fp_neg22_prediction Device: cuda # cuda OR cpu. When None, it automatically finds the processing device. Beam size: 3 Number of CPU processing threads: 4

…

On Jul 11, 2024, at 4:34 PM, Sadjad Fakouri Baygi ***@***.***> wrote: Those m/z thresholds pertain to the mass themselves not their intensities. Can you also post your YAML file? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSAYJFTIN47HZLOPGGUDVTZL3T45AVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA3TMNZTGY>. You are receiving this because you authored the thread.

sara-hashemi · 2024-07-12T00:40:27Z

By this are you referring to the fact that the minimum and maximum m/z should be investigated over the mass of all samples (train/validation and test) then the yaml file can include those numbers? Could this be why the algorithm bypasses some samples? I will also look into the names for a more accurate representation.

…

On Jul 11, 2024, at 4:38 PM, Sadjad Fakouri Baygi ***@***.***> wrote: WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated. The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSAYJHQJCHGS2YYPNIE2STZL3UMFAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA4TSOBTHE>. You are receiving this because you authored the thread.

sajfb · 2024-07-12T00:46:10Z

Your precursor mass is out of the mass range specified in the YAML file.

Minimum m/z: 100
Maximum m/z: 900

There is a 10% tolerance in number of peaks for fragmentation mass to be outside of the training space, but the precursor mass must be within this range.

Additionally, keep in mind:

Minimum number of peaks: 5 counted after noise removal threshold: 0.01

sajfb · 2024-07-12T00:53:38Z

By this are you referring to the fact that the minimum and maximum m/z should be investigated over the mass of all samples (train/validation and test) then the yaml file can include those numbers? Could this be why the algorithm bypasses some samples? I will also look into the names for a more accurate representation.
…
On Jul 11, 2024, at 4:38 PM, Sadjad Fakouri Baygi @.***> wrote: WARNING!!! Removed MSP block ID 1 related to A_M8_negPFP_03! Names don't need to be unique, but in this case the MSP block ID 1 should be specifically investigated. The m/z thresholds refer to the mass values, not their intensities. Could you please also share your YAML file? — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSAYJHQJCHGS2YYPNIE2STZL3UMFAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHA4TSOBTHE. You are receiving this because you authored the thread.

Yes, each m/z value is represented by a specific embedded token. If that token is not in the training space, the model cannot represent your chemical space.

sara-hashemi · 2024-07-12T00:55:24Z

Thanks Sadjad for your help and advice. I believe I have three items on my plate for further investigation. I appreciate the explanation. Kind regards, Sara

…

On Jul 11, 2024, at 8:46 PM, Sadjad Fakouri Baygi ***@***.***> wrote: Your precursor mass is out of the mass range specified in the YAML file. Minimum m/z: 100 Maximum m/z: 900 There is a 10% tolerance in number of peaks for fragmentation mass to be outside of the training space, but the precursor mass must be within this range. Additionally, keep in mind: Minimum number of peaks: 5 counted after noise removal threshold: 0.01 — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSAYJA4ZQ4WGGAFSBT66Y3ZL4ROPAVCNFSM6AAAAABJ2M2FQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRUGIZDIMZSGY>. You are receiving this because you authored the thread.

sajfb · 2024-07-12T00:56:07Z

you're welcome!

barupal transferred this issue from idslme/IDSL.IPA Jun 24, 2024

NTuan-Nguyen self-assigned this Jun 24, 2024

NTuan-Nguyen closed this as completed Jul 11, 2024

sajfb reopened this Jul 11, 2024

sajfb closed this as completed Jul 12, 2024

sajfb mentioned this issue Jul 25, 2024

Error training on GPU #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on using IDSL_MINT with cuda device #1

Issue on using IDSL_MINT with cuda device #1

sara-hashemi commented Jun 24, 2024

barupal commented Jun 24, 2024

NTuan-Nguyen commented Jun 24, 2024

sara-hashemi commented Jun 25, 2024

sara-hashemi commented Jul 11, 2024 via email

sajfb commented Jul 11, 2024 •

edited

Loading

sara-hashemi commented Jul 11, 2024 via email

sajfb commented Jul 11, 2024

sara-hashemi commented Jul 11, 2024 via email

sajfb commented Jul 11, 2024

sara-hashemi commented Jul 12, 2024 via email

sara-hashemi commented Jul 12, 2024 via email

sajfb commented Jul 12, 2024

sajfb commented Jul 12, 2024

sara-hashemi commented Jul 12, 2024 via email

sajfb commented Jul 12, 2024

Issue on using IDSL_MINT with cuda device #1

Issue on using IDSL_MINT with cuda device #1

Comments

sara-hashemi commented Jun 24, 2024

barupal commented Jun 24, 2024

NTuan-Nguyen commented Jun 24, 2024

sara-hashemi commented Jun 25, 2024

sara-hashemi commented Jul 11, 2024 via email

sajfb commented Jul 11, 2024 • edited Loading

sara-hashemi commented Jul 11, 2024 via email

sajfb commented Jul 11, 2024

sara-hashemi commented Jul 11, 2024 via email

sajfb commented Jul 11, 2024

sara-hashemi commented Jul 12, 2024 via email

sara-hashemi commented Jul 12, 2024 via email

sajfb commented Jul 12, 2024

sajfb commented Jul 12, 2024

sara-hashemi commented Jul 12, 2024 via email

sajfb commented Jul 12, 2024

sajfb commented Jul 11, 2024 •

edited

Loading