-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the RDKitConverter caching system #2942
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2942 +/- ##
========================================
Coverage 93.55% 93.56%
========================================
Files 176 176
Lines 22837 22837
Branches 3194 3195 +1
========================================
+ Hits 21366 21368 +2
+ Misses 1421 1418 -3
- Partials 50 51 +1
Continue to review full report at Codecov.
|
ping @IAlibay @richardjgowers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the use of the lru cache from the stdlib. Peripheral comments inline.
conversions in memory. Using ``maxsize=None`` will remove all limits | ||
to the cache size, i.e. everything is cached. | ||
""" | ||
global atomgroup_to_mol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
justified use of global
;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not thread-safe – not a big deal, though, and I don't have a better idea.
(Although, we don't really encourage use of threads for parallelization; multiprocessing should do just fine.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm! Just a few comments, mainly to do with tests & docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cbouy, couple of near-final comments with very minimal changes (one of which probably can be just ignored). The main discussion point remains this implicit hydrogens thing.
I updated my first post with the new changes. |
All tests are passing 💃 anything else ? |
Apologies for taking so long here, I'll re-review over the weekend but I think we should be good. |
@cbouy if you want to update this against the current develop, it'll finally be on my list for the next thing I review. |
Okay I think I finally managed to run a proper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("smi", ["[H]", "C", "O", "[He]"]) | ||
def test_single_atom_mol(self, smi): | ||
u = mda.Universe.from_smiles(smi, addHs=False, | ||
generate_coordinates=False) | ||
mol = u.atoms.convert_to("RDKIT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I'm just being silly and forgetting a very obvious thing. Could you remind me why these are all being switched away from convert_to
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert_to
doesn't pass arguments to the underlying converter, it was in a PR at some point though (#2882 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this behaviour contradict the docstring? I.e. ":func:set_converter_cache_size
. However, ag.convert_to("RDKIT")
followed by ag.convert_to("RDKIT", NoImplicit=False)
will not use the"
Or was the argument that we would merge #2882 before this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is the converter modules weren't really documented to be instantiated like c = mda.coordinates.RDKit.RDKitConverter(); c.convert(...)
but usually go through the convert_to
AtomGroup method.
So yeah I assumed 2882 would be merged before v2.0 comes out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, let's see if we can revive #2882 then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched back to using convert_to
now that it's merged!
@IAlibay sorry, don't have time today to review — I'll leave it to you. |
I think this PR is complete, but I want to hold off on merging before we have a clearer idea of what's going on with #2882. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why I completely forgot 🙀 We need a changelog entry (and can you also add in an entry for #2926?).
I'm adding the missing changelog now. For the changelog of this PR though, do I mention the fixes/changes I made, or just the enhancements ( |
If there was an issue for the fixes then I’d still add a CHANGELOG entry even though it’s a bit weird. However,for people living in the edge (using develop) it’s still helpful.
… Am 4/24/21 um 09:30 schrieb Cédric Bouysset ***@***.***>:
I'm adding the missing changelog now. For the changelog of this PR though, do I mention the fixes/changes I made, or just the enhancements (set_converter_cache_size(maxsize) and the force parameter) ? The RDKit converter isn't released yet so it's a bit weird fixing something that isn't officially out...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Sorry for the delayed response here @cbouy. I'd add the following entries: enhancements:
changes:
Fixes:
edit: once #2882 is done if you can add these and then update against develop I'll merge this. |
RDKIT crashes are starting to happen too frequently for py3.6 + numpy 1.16 (see: #3287), I'm not sure if this is somehow linked to the new converter API, so I've updated this PR against the current develop to see if it fixes things. @cbouy please do double check that I've not accidentally broken things! edit: best way to check that this is fixed is just by re-running CI I guess -- number of successful CI runs: 3 (that should be enough) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cbouy lgtm!
The current "homemade" caching system in the RDKit converter only allows to store the most recent conversion.
This new version uses the
functools.lru_cache
which allows users to select how many molecules should be cached, and improves readability/maintainability IMOAlso, the new caching system retrieves the converted items from the hash of all the arguments passed to the decorated
atomgroup_to_mol
function, instead of the id of the atomgroup and the arguments, which makes more sense. I didn't know what a hash was until recently so please forgive me for the rookie mistake :DNow if you successively run
u.atoms.convert_to("RDKIT")
it will benefit from the caching system.I needed to convert two different atomgroups (protein and ligand) while iterating over a trajectory and the previous system would just rebuild the whole topology (which takes quite some time for a protein) for each molecule at every frame hence why I think this is necessary. Now it works like a breeze.
Changes made in this Pull Request:
functools.lru_cache
set_converter_cache_size(maxsize)
function to modify how many items are retained in the cacheatomgroup_to_mol
outside of the RDKitConverter class (it's not really needed there anyway), otherwise I need to define hash and eq dunders for the caching to workAttributeError
. The error is not raised whenNoImplicit=False
force
parameter to the RDKitConverter to ignore the aboveAttributeError
and continue the conversion, which is mostly useful for inorganic molecules, CO2 and so on.PR Checklist