-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_dock is running, but stops output into the output.db file #22
Comments
Update of the problem: It seems that some molecules cause docking problem, thus no write into .db file, while docking for next molecule in the .smi file will go on. And after all molecules are docked, next smi file run_dock begins to work, but no .sdf file was written since .db file is incomplete. This is annoying, as you have to check again and again, and rerun the run_dock to overcome the problem molecules. |
This is a strange behavior, because if docking of a molecule is failed, this molecule should be simply skipped (the fields
|
Just a note. If a molecule is failed during the first run of the script, in the second run the script will try to dock it again. Normally we did not observed many failed docking attempts across millions of compounds. Maybe your compounds are somewhat special. Maybe there are issues in molecule preparation by |
Yes, if unsucessful docking can be skipped and next successful docking of molecule can be written into sdf, that will be quite fine. Attachment contains: Following is another run using the same smi files and settings. Output message is included containing some error messages
|
Yes, just as this !
I split the overall smi file into files, each containing 20 molecules. Now get 1588 sdf files and 1906 db files.
|
Hi! perhaps it might be a gnina issue? I have tried running the smi files using vina program (I can't replicate in gnina because I am using Mac OS M1), and I did not encounter the error issue. All of the compounds so far can be docked and the sdf output is there for the dk_* files I have also adjusted the config file to accomodate the vina program and my limited cpu (in case it plays a part in the error):
Output for your reference (pls ignore the |
Thanks a lot! I also tried and it works well. The results is similar with your providing one. Great work ! Currently, I have not added protonation and just use --no_protonation option. |
We also encountered this issue with the latest version of gnina now. @Samuel-gwb and @Feriolet, could you share your errors? However, the error is not well reproducible. The same set of 12000 molecules may be docked without errors for some protein conformations, but for others gives an error just once, for some - two and more times.
Overall, the behavior, where docking continues after the core dump but the database is not updated, is strange. If a molecule causes an error it should be skipped, but it somehow blocks database update. This is an issue of EasyDock already and it should be fixed. |
I can't provide the gnina error as I am not using Ubuntu right now to test the run_dock. However, if I interpret your comment right, the EasyDock issue is that the output.db for some molecules are not updated correctly even if they are docked successfully after some molecules are skipped. I have found a solution to that issue: #### Using autodock vina
def mol_dock(mol, config):
# Load config and temp file
# Reproducing a random error on certain molecules
try:
randomnumber = random.randint(100)
if randomnumber > 10:
print(randomnumber)
raise ValueError
else:
print('success',randomnumber)
with open(ligand_fname, 'wt') as f:
f.write(ligand_pdbqt)
# Adding an except argument
except:
pass
finally:
# close and unlink For some reason, adding the except line solve the issue of output.db not updating. |
Thank you for the suggestion, but this did not help. The current version of the function I use for testing. I added explicit catching exceptions in subprocess and treat them explicitly, and afterwards add catching of all other exceptions. The issue here is that error occurs in subprocess and not in Python code. def mol_dock(mol, config):
"""
:param mol: RDKit Mol of a ligand with title
:param config: yml-file with docking settings
:return:
"""
output = None
config = __parse_config(config)
mol_id = mol.GetProp('_Name')
boron_replacement = config["cnn_scoring"] in [None, "none"]
ligand_pdbqt = ligand_preparation(mol, boron_replacement=boron_replacement)
if ligand_pdbqt is None:
return mol_id, None
output_fd, output_fname = tempfile.mkstemp(suffix='_output.pdbqt', text=True)
ligand_fd, ligand_fname = tempfile.mkstemp(suffix='_ligand.pdbqt', text=True)
try:
with open(ligand_fname, 'wt') as f:
f.write(ligand_pdbqt)
cmd = f'{config["script_file"]} --receptor {config["protein"]} --ligand {ligand_fname} --out {output_fname} ' \
f'--config {config["protein_setup"]} --exhaustiveness {config["exhaustiveness"]} ' \
f'--seed {config["seed"]} --scoring {config["scoring"]} ' \
f'--cpu {config["ncpu"]} --addH {config["addH"]} --cnn_scoring {config["cnn_scoring"]} ' \
f'--cnn {config["cnn"]} --num_modes {config["n_poses"]}'
start_time = timeit.default_timer()
subprocess.run(cmd, shell=True, check=True) # this will trigger CalledProcessError and skip next lines
dock_time = round(timeit.default_timer() - start_time, 1)
score, pdbqt_out = __get_pdbqt_and_score(output_fname)
mol_block = pdbqt2molblock(pdbqt_out.split('MODEL')[1], mol, mol_id)
output = {'docking_score': score,
'pdb_block': pdbqt_out,
'mol_block': mol_block,
'dock_time': dock_time}
except subprocess.CalledProcessError as e:
sys.stderr.write(f'Error caused by docking of {mol_id}\n')
sys.stderr.write(str(e))
sys.stderr.write('STDERR output:\n')
sys.stderr.write(e.stderr)
sys.stderr.flush()
output = None
except:
pass
finally:
os.close(output_fd)
os.close(ligand_fd)
os.unlink(ligand_fname)
os.unlink(output_fname)
return mol_id, output Now I think that the issue may be in releasing file resources in the finally block. However, I do not understand how this may block database update with results returned from other processes running in parallel. |
It works! So, the solution could be |
I pushed the commit fixing this freezing of database update. Please check it on your files and settings. You can check the latest version from the repo. Now problematic molecules should be skipped as expected. This fix will be included in the next release soon. It is important to use the latest version also due to fixed extraction of docking scores from gnina docking outputs. There was a bug in the main repo and scores extracted previously were smina-like scores. |
I tried gnina with updated code, and still meet an error. This time, the command stopped after output error. Before, it go on docking next molecule but not update the .db file. |
Can you try redoing it with the updated repo? The error you sent has the old code def __get_pdbqt_and_score(ligand_out_fname):
with open(ligand_out_fname) as f:
pdbqt_out = f.read()
match = re.search(r'REMARK CNNaffinity\s+([\d.]+)', pdbqt_out)
if match:
score = round(float(match.group(1)), 3)
else:
match = re.search(r'REMARK minimizedAffinity\s+(-?[\d.]+)', pdbqt_out)
score = round(float(match.group(1)), 3)
return score, pdbqt_out |
I checked. gnina_dock.py should contain the very codes above. Attachments contain input files and gnina_dock.py (named as gnina_dock_py.txt) files. Just now, I have not "pip install easydock" after git pull / update. After pip install, the gnina docking go on when meeting problem molecules, while .db file is not updated, seems similar with problem several days ago. |
To confirm, there is only one easydock version on your computer right? I want to rule out the possibility that the easydock use the old version which cause the not updated .db file instead of the new version |
Yes, only one environment. -rw-rw-r-- 1 gwb gwb 17399 3月 1 10:15 database.py Then "pip install easydock" in the main directory. |
That's odd. I have tried to run it on my end and it does not produce any issue. The only issue I encountered is when I use more memory than I have. Here is the result on my end. I have used binary version of gnina. Even when I encountered the STDERR output:
terminate called after throwing an instance of 'std::runtime_error'
what(): out of memory
*** Aborted at 1710213751 (unix time) try "date -d @1710213751" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGABRT (@0x3ed0013c651) received by PID 1295953 (TID 0x7fbdda1c7000) from PID 1295953; stack trace: ***
@ 0x7fbde4569420 (unknown)
@ 0x7fbde3fbc00b gsignal
@ 0x7fbde3f9b859 abort
@ 0x7fbde437bee6 (unknown)
@ 0x7fbde438df8c (unknown)
@ 0x7fbde438dff7 std::terminate()
@ 0x7fbde438e258 __cxa_throw
@ 0x75cde2 caffe::SyncedMemory::mutable_gpu_data()
@ 0x5c1e92 caffe::Blob<>::mutable_gpu_data()
@ 0x78f93a caffe::BatchNormLayer<>::Forward_gpu()
@ 0x727541 caffe::Net<>::ForwardFromTo()
@ 0x727666 caffe::Net<>::Forward()
@ 0x50b950 CNNScorer::score()
@ 0x49927e get_cnn_info()
@ 0x4a0435 do_search()
@ 0x4a23bd main_procedure()
@ 0x4a2cfe threads_at_work()
@ 0x4ba7e3 boost::_bi::list6<>::operator()<>()
@ 0x81f24e thread_proxy
@ 0x7fbde455d609 start_thread
@ 0x7fbde4098353 clone
@ 0x0 (unknown)
Aborted (core dumped) |
Oh. |
Hi @Feriolet , |
Thank you very much! We finally overcome this issue and I hope it will not appear again) |
When I try to docking a .smi file containing 10k molecules against a receptor, the size of output.db increases continually.
However, after some time (tens of minutes), the increase stopped, while run_dock still gives good output.
Then I stop the run_dock running, and then just re-run the run_dock task. The size of output.db will increase again.
This process will repeat and repeat.
How to solve this problem and continue docking until the last molecue ?
The text was updated successfully, but these errors were encountered: