-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an AnalysisCollection
class
#4017
base: develop
Are you sure you want to change the base?
Conversation
8c2c77d
to
e4491a7
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4017 +/- ##
===========================================
- Coverage 93.59% 93.17% -0.43%
===========================================
Files 168 12 -156
Lines 21104 1069 -20035
Branches 3919 0 -3919
===========================================
- Hits 19752 996 -18756
+ Misses 894 73 -821
+ Partials 458 0 -458 ☔ View full report in Codecov by Sentry. |
924d05c
to
a6d9ef2
Compare
AnalayisCollection
classAnalayisCollection
class
AnalayisCollection
classAnalysisCollection
class
bfb6ab8
to
a0acb3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Can I be cheeky and inquire what sort of speed up you get running something like 2 separate rdfs in tandem?
package/MDAnalysis/analysis/base.py
Outdated
analysis_object.times[i] = ts.time | ||
analysis_object._single_frame() | ||
|
||
if reset_timestep: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ts is never unassigned, so I'm not sure this is necessary here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, if an instance is changing the ts object we have to restore it from the stored one.
package/MDAnalysis/analysis/base.py
Outdated
ts_original = ts.copy() | ||
|
||
for analysis_object in self._analysis_objects: | ||
analysis_object._frame_index = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is messy but understandable. In hindsight, maybe passing these variables through the single frame method would have been cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe be, but changing this might break a lot. I can put a comment so that these setters might become clearer.
If I use the code given in the example of class I got a speedup of ~30% compared to running each class individually. Quite good for such a simple change 🙂 |
0867afd
to
3c5de33
Compare
@richardjgowers I made you the responsible adult in the room ;-) — please shepherd the PR to completion |
Up for discussion, would it be a good idea to inherit for analysis in collection:
print(analysis.results) Besides, it feels a bit weird to me that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like turning the core of AnalysisBase inside out and into AnalysisCollection (ie run()
).
It's semantically confusing and it also makes it even harder to understand how to write your own analysis.
I think the logic needs to be rethought.
package/MDAnalysis/analysis/base.py
Outdated
return self | ||
|
||
|
||
class AnalysisBase(AnalysisCollection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @yuxuanzhuang : it feels very weird and circular to have AnalysisBase inherited from AnalysisCollection. It just does not make sense semantically.
Can this be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing why I did this to do not copy code of the run
method between AnalysisBase
and AnalysisCollection
. We can remove the inheritance by explicitly leaving the run method in AnalysisBase
untouched. This would be fine for me!
We could wither duplicate the logic, which I don't like. Or, create en external class that both the Base and the Collection use. |
Would it be a good idea to create an external |
I am fine with creating these two classes. To avoid code duplication the Also, should they rather be private classes? I think users will not use them because these runners will require an already prepared analysis class. |
I think this solution is the cleanest and doesn't require even more classes which I don't like the idea of. IMO there is already enough of a barrier to entry without every new analysis having to handle 2x new API points. Personally I would just have an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry should have left a review.
Yes, we can do an ABC class if the others also agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at it, it initially seems strange that Single inherits from Collection, but for the code it does make sense.
@PicoCentauri have you thought about how results could be accessed from the AnalysisCollection
object? Or is the intent to always access results from the individual classes again?
package/MDAnalysis/analysis/base.py
Outdated
reset_timestep : bool, optional | ||
Reset the timestep object after for each ``analysis_object``. | ||
Setting this to ``False`` can be useful if an ``analysis_object`` | ||
is performing a trajectory manipulation which is also useful for the | ||
subsequent ``analysis_instances`` e.g. unwrapping of molecules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the basis that there should be one way to do anything, I don't like the suggestion that you could use an analysis class to transform data for downstream classes (use a transformation instead). Instead make reset_timestep=True
the only option and remove this kwarg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree and I will change this.
package/MDAnalysis/analysis/base.py
Outdated
# Ensure compatibility with API of version 0.15.0 | ||
if not hasattr(self, "_analysis_instances"): | ||
self._analysis_instances = (self,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing, instead this is making a single analysis class work as a subclass of a Collection
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is just a leftover from the code before. We can also remove it I think, if we do not support 0.15.0 anymore.
package/MDAnalysis/analysis/base.py
Outdated
@@ -316,6 +489,7 @@ def __init__(self, trajectory, verbose=False, **kwargs): | |||
self._trajectory = trajectory | |||
self._verbose = verbose | |||
self.results = Results() | |||
super(AnalysisBase, self).__init__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ugly super call signature was Python 2.x, I think we can use super().__init__(self)
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
package/MDAnalysis/analysis/base.py
Outdated
@@ -220,7 +224,176 @@ def __setstate__(self, state): | |||
self.data = state | |||
|
|||
|
|||
class AnalysisBase(object): | |||
class AnalysisCollection(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to inherit from object any more either (it's implicit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
package/CHANGELOG
Outdated
@@ -26,6 +25,8 @@ Fixes | |||
(Issue #3336) | |||
|
|||
Enhancements | |||
* Add an `AnalayisCollection` class to perform multiple analysis on the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo on Analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cheers
3c5de33
to
3ba9d23
Compare
Thanks to all of you for your comments and suggestions and sorry for my later answer. I agree that the current design is a bit confusing. I thought about the A cleaner way for reading the code would be the idea that @yuxuanzhuang suggested that we provide a (private ?) Regarding some questions and comments:
This was never the idea. There will only be one API point for new Analysis classes.
The latter. I think accessing the results from the individual instances is already handy. An additional way would be just confusing and blowing up the code. |
@orbeckst @hmacdope I would still like to get this feature into main. Before I rebase again I would like to ask for your opinion again. I think the biggest issue to be discussed is the inheritance of |
@PicoCentauri sorry for the following short comment... I've run out of "open source time" for the day: There's a longish discussion on parallel analysis in #4158 and it would be good to hear your opinion how the proposed feature could/should work with an AnalysisBase that would try to parallelize with split-apply-combine (see PMDA https://doi.org/10.25080/Majora-7ddc1dd1-013 for the simple idea). |
Hi @PicoCentauri , thanks so much for the PR, it's a neat idea! Regarding of being compatible with #4158, I feel like you'd need to change the And I'd probably indeed subclass |
Thanks @orbeckst and @marinegor for the comments.
Changing to
I am not sure If I get your comment correctly. You would subclass |
@PicoCentauri I indeed mean "subclass But if you subclass |
Hello @PicoCentauri! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-06-05 16:10:06 UTC |
Linter Bot Results:Hi @PicoCentauri! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
c03b386
to
04e4967
Compare
04e4967
to
9ae6a51
Compare
Sorry for the lack of updates for a while. I addressed your concerns and moved the implementation of the |
hey @PicoCentauri , the parallelization PR that we've mentioned earlier has been merged into develop (#4162), and your Could you perhaps have a look at your code after you merge If this doesn't work, perhaps I'd suggest implementing |
Fixes #3569
The
AnalysisCollection
follows the discussion of #3569 resetting the timestep object for each analysis by default. If requested the user can set thereset_timestep
variable toFalse
allowing altering the timestep object.Within this PR I moved the run method from
AnalysisBase
toAnalysisCollection
to avoid code a lot of duplication. These changes interface of Github might make this a bit difficult to follow. The core changes therun
method only that instead of running the_prepare
,_single_frame
and_conclude
methods only once it is looped over all provided analysis instances.Changes made in this Pull Request:
AnalysisCollection
class to perform multiple analyses on the same trajectoryassert_equal
intest_base.py
PR Checklist
Disclaimer: This PR was written with inspiration of OpenAI's ChatGPT.