when calculate descriptors from smiles, an error occured. #9

kexul · 2019-10-18T14:44:46Z

The error log is PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

By the way, thanks for your great job!

The text was updated successfully, but these errors were encountered:

tjkessler · 2019-10-28T00:03:17Z

Sorry for the late reply!

Could you provide additional information about the command you are executing? A code snippet will help me diagnose the issue further.

kexul · 2019-11-01T02:45:51Z

I processed a big csv file which contains about 700000 smiles, you may try it here,
all_smi.zip , the code I used was the same as your example

from padelpy import from_smiles
with open('all_smi.csv', 'rt') as f:
   smi = f.read_line()
   descriptors = from_smiles('smi')
``` .

tjkessler · 2019-11-07T19:26:28Z

@kexul,

PaDEL-Descriptor sometimes times out when calculating descriptors for smaller compounds (I'm not entirely sure why, one would think their calculations would be very quick):

descriptors = from_smiles('CCC')

PaDELPy's "from_smiles" function tries three times to calculate the descriptors for a given compound, and if a RuntimeError is encountered all three times, the error you saw is thrown:

PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

By default, if the generation process exceeds 12 seconds, this is seen as a failure. You can try increasing the timeout:

# increase timeout to 30 seconds
descriptors = from_smiles('CCC', timeout=30)

If this doesn't help, I recommend you catch the exception and perform an action to account for it:

try:
    descriptors = from_smiles('CCC')
except RuntimeError:
    # Do something

Let me know if increasing the timeout value helps! If it does, it may justify increasing the default value for the "from_smiles" and "from_mdl" functions.

Best,
Travis

tjkessler · 2020-01-01T16:07:50Z

I'm going to go ahead and close this issue due to inactivity. @kexul - keep me updated as to whether any of the methods I outlined above work for you!

katasanirohith · 2021-01-05T17:20:42Z

Hey, I have tried increasing the time out to 30 but still facing the same error

raise RuntimeError('PaDEL-Descriptor encountered an error: {}'.format(
RuntimeError: PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

Edit:
I have 200 SMILES in a file, I am reading it and doing the following:

for index, i in enumerate(reader): 
    print(index) 
    descriptors = from_smiles(i[0], timeout=60)

After 5 molecule it throws the above timeout error.

System specifications:
8 vCPUs
56GB RAM

I think I have found the answer,
The molecule length is huge so its taking more time to process. My suggestion is to increase the time limit to a greater number, as the average time taken for me greater than 30 secs.

RajaramWalavalkar · 2021-02-18T17:58:19Z

Hello,
I obtained 1875 desriptors from padelpy package using SMILES as an input. But I am from a non-chemistry background. So Will you please tell me about what exactly those descriptors signify?
For some column headings like nAtoms, nAromatic atoms, nAromatic bonds,etc it's easily understood but for columns names ATS, AATS, ATSC, MATs, GATs,... I am not understanding what does it signify.
Can you help me with this?

tjkessler · 2021-03-02T17:44:22Z

@RajaramWalavalkar,

Each descriptor gives a numerical representation of some physical, chemical, or electromechanical aspect of a given compound. For example, "nN" is the number of nitrogen atoms present in the compound, "nC" is the number of carbon atoms present, etc.

Some of the descriptors are somewhat ambiguous - the ATS descriptors are a measurement of autocorrelation between neighboring atoms with respect to a certain weighting, such as mass and charge. More detailed descriptions for each descriptor can be found in a spreadsheet at http://www.yapcwsoft.com/dd/padeldescriptor/ by clicking the "1875" link towards the top of the page.

Best,
Travis

rishabhiiitd071 · 2022-01-17T14:22:07Z

Hey, I have tried increasing the time out to 30 but still facing the same error
raise RuntimeError('PaDEL-Descriptor encountered an error: {}'.format(
RuntimeError: PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call
Edit: I have 200 SMILES in a file, I am reading it and doing the following:
for index, i in enumerate(reader): 
    print(index) 
    descriptors = from_smiles(i[0], timeout=60) 
After 5 molecule it throws the above timeout error.

System specifications: 8 vCPUs 56GB RAM

I think I have found the answer, The molecule length is huge so its taking more time to process. My suggestion is to increase the time limit to a greater number, as the average time taken for me greater than 30 secs.

Please can you specify till how much should one increase the timeout? I am getting same error for 172 smiles, and timeout used was 60.

Luizerko · 2022-09-19T13:19:09Z

Hello, guys. I had the same problem and I was not able to fully solve it, but here are my two attempts.

First, increasing the number of chunks and decreasing the length of the list of compounds to calculate the descriptors for. I used comp_subset_len = 10 compounds per request. Even though I lost a little bit of optimality doing more requests, I got my script to calculate descriptors for more compounds.

Second, I used the VERY BAD strategy of try/except inside a while True loop. Let i be the chunk number and 60 be the initial timeout value.

while True:
        try:
            descriptors_dict = from_smiles(<list_of_SMILES>[i*comp_subset_len:(i+1)*comp_subset_len], \
                                           timeout=timeout)
            break
            
        except:
            timeout = timeout*2
            print('Doubling timeout')

That strategy did not work, even after got to timeout=240, so I decided to simply skip some SMILES. It was not worth it spending so much time on a few molecules. Maybe the code will work if one is patient enough.

One other suggestion: it would be very nice of the developers if they could share a link to a CSV with some descriptors. I would guess that they have probably tested the library for a bunch of compounds and that they have some files with a lot of descriptors. If this is not the case or sharing such a file will not be possible, forget about it.

Last but not least, thank you very much for the library :)

tjkessler closed this as completed Jan 1, 2020

sophiameerkat mentioned this issue Apr 8, 2020

Problem when calculate descriptors from smiles by using csv #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when calculate descriptors from smiles, an error occured. #9

when calculate descriptors from smiles, an error occured. #9

kexul commented Oct 18, 2019 •

edited

Loading

tjkessler commented Oct 28, 2019

kexul commented Nov 1, 2019 •

edited

Loading

tjkessler commented Nov 7, 2019

tjkessler commented Jan 1, 2020

katasanirohith commented Jan 5, 2021 •

edited

Loading

RajaramWalavalkar commented Feb 18, 2021

tjkessler commented Mar 2, 2021

rishabhiiitd071 commented Jan 17, 2022 •

edited

Loading

Luizerko commented Sep 19, 2022 •

edited

Loading

when calculate descriptors from smiles, an error occured. #9

when calculate descriptors from smiles, an error occured. #9

Comments

kexul commented Oct 18, 2019 • edited Loading

tjkessler commented Oct 28, 2019

kexul commented Nov 1, 2019 • edited Loading

tjkessler commented Nov 7, 2019

tjkessler commented Jan 1, 2020

katasanirohith commented Jan 5, 2021 • edited Loading

RajaramWalavalkar commented Feb 18, 2021

tjkessler commented Mar 2, 2021

rishabhiiitd071 commented Jan 17, 2022 • edited Loading

Luizerko commented Sep 19, 2022 • edited Loading

kexul commented Oct 18, 2019 •

edited

Loading

kexul commented Nov 1, 2019 •

edited

Loading

katasanirohith commented Jan 5, 2021 •

edited

Loading

rishabhiiitd071 commented Jan 17, 2022 •

edited

Loading

Luizerko commented Sep 19, 2022 •

edited

Loading