Added function to get the number of stereoisomers #217

zhu0619 · 2023-11-17T23:56:00Z

Changelogs

Added datamol.isomers._enumerate.count_stereoisomers
Added unit tests to count stereoisomers for only undefined and all possible stereoisomers.

The step Chem.FindPotentialStereoBonds(mol, cleanIt=clean_it), the information on bond is cleared if cleanit=True.
Therefore, cleanit should be disabled when performing enumeration or counting only on undefined stereochemistry when the molecules have defined stereo information on bonds.

See example below:

Reproduce the error

import datamol as dm
from rdkit import Chem

from rdkit.Chem.EnumerateStereoisomers import GetStereoisomerCount, StereoEnumerationOptions, EnumerateStereoisomers
n_variants= 20
undefined_only= True # <-
rationalise = True
timeout_seconds= None
clean_it= True
stereo_opts = StereoEnumerationOptions(
        tryEmbedding=rationalise,
        onlyUnassigned=undefined_only,
        unique=True,
    )
mol  = dm.to_mol('Br/C=C\Br')
Chem.AssignStereochemistry(mol, force=False, flagPossibleStereoCenters=True, cleanIt=clean_it)  # type: ignore
Chem.FindPotentialStereoBonds(mol, cleanIt=clean_it)  # type: ignore
dm.to_image(list(EnumerateStereoisomers(mol, options=stereo_opts)))

codecov · 2023-11-17T23:58:08Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (9e94d02) 91.96% compared to head (e812492) 91.93%.

Files	Patch %	Lines
datamol/isomers/_enumerate.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #217      +/-   ##
==========================================
- Coverage   91.96%   91.93%   -0.03%     
==========================================
  Files          46       46              
  Lines        3832     3843      +11     
==========================================
+ Hits         3524     3533       +9     
- Misses        308      310       +2

Flag	Coverage Δ
unittests	`91.93% <91.66%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hadim

Thanks Lu.

It looks good to me after fixing the docstring.

Question: my understanding is that GetStereoisomerCount will actually do that exact same as enumerate_stereoisomers with n_variants=<MAX> and simply call len() on the output. Am I correct here? Maybe check what the rdkit code is doing under the hood. Not really a big deal for me here but I just wanted to flag it in case you think count() should instead reuse enumerate().

hadim · 2023-11-20T14:06:12Z

datamol/isomers/_enumerate.py

+        rationalise: If we should try to build and rationalise the molecule to ensure it
+            can exist.
+        clean_it: A flag for assigning stereochemistry. If True, it will remove previous stereochemistry
+            markings on the bonds.


I think the CI is failing because of the malformed docstring. Locally, you can call mkdocs serve to reproduce the error and it can help to fix the docstring.

zhu0619 · 2023-11-20T15:06:09Z

Thanks Lu.

It looks good to me after fixing the docstring.

Question: my understanding is that GetStereoisomerCount will actually do that exact same as enumerate_stereoisomers with n_variants=<MAX> and simply call len() on the output. Am I correct here? Maybe check what the rdkit code is doing under the hood. Not really a big deal for me here but I just wanted to flag it in case you think count() should instead reuse enumerate().

[GetStereoisomerCount](https://github.com/rdkit/rdkit/blob/2a68050ed07a3b27cabf33d535f0c46117135209/rdkit/Chem/EnumerateStereoisomers.py#L136C24-L136C24) computes an estimated number based on the stereo bonds. So in some cases, the counts from GetStereoisomerCount is larger than the enumerations.

Initially, I was using the output of enumerate_stereoisomers. But the computational time is too long especially for large dataset even with parallelization.

I will also add an option to count the isomer using enumerate_stereoisomers if the user needs more accurate counts.

hadim · 2023-11-20T15:13:11Z

ok, so it seems like GetStereoisomerCount is doing a slightly different things and also seems faster. All good then, thank you Lu!

zhu0619 added 3 commits November 17, 2023 17:34

add count_stereoisomers

86c00e9

add test

66c73b0

update init

9a09a91

zhu0619 requested a review from hadim as a code owner November 17, 2023 23:56

wip

8ce287a

zhu0619 added feature fix labels Nov 17, 2023

format

4126ac2

wip

f183b3c

zhu0619 mentioned this pull request Nov 18, 2023

Improve stereoisomer detection polaris-hub/polaris#58

Merged

hadim reviewed Nov 20, 2023

View reviewed changes

zhu0619 added 2 commits November 20, 2023 11:09

add precise option

fe15876

wip

e812492

zhu0619 merged commit c23d273 into main Nov 20, 2023
15 checks passed

hadim deleted the feat/isomers branch November 24, 2023 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added function to get the number of stereoisomers #217

Added function to get the number of stereoisomers #217

zhu0619 commented Nov 17, 2023

codecov bot commented Nov 17, 2023 •

edited

Loading

hadim left a comment

hadim Nov 20, 2023

zhu0619 commented Nov 20, 2023

hadim commented Nov 20, 2023

Added function to get the number of stereoisomers #217

Added function to get the number of stereoisomers #217

Conversation

zhu0619 commented Nov 17, 2023

Changelogs

codecov bot commented Nov 17, 2023 • edited Loading

Codecov Report

hadim left a comment

Choose a reason for hiding this comment

hadim Nov 20, 2023

Choose a reason for hiding this comment

zhu0619 commented Nov 20, 2023

hadim commented Nov 20, 2023

codecov bot commented Nov 17, 2023 •

edited

Loading