-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The future of BespokeFit #224
Comments
This follows the definition but our community does not follow the definition. In principle there should be
Where possible I like to split things into Mocking is powerful but in my experience makes it harder to follow what's going on, or what's intended to go on. I'm ambivalent on documenting self-contained unit tests (usually something that's hard to read should be split up or it's not really a unit test) but my experience trying to understand tests with several mocked components would be improved with some more documentation or comments. As a developer, I'd like to see better support for temporary files so I can, for example, tinker with the optimization step without needing to re-run the fragmentation and QC steps. I also wonder if the current workflow (collect inputs, fragment, QC, optimize) will change at some point in the future. (i.e. a neural net can mimic QC well enough that fragmentation is not necessary, or and extra step needs to be added into this workflow for reasons not currently imaginable to us). |
Hi @Yoshanuikabundi thanks for spending the time looking into this Ill try and respond to the bits I can help with but overall I agree that a rework would make development easier! As a few PRs like chained tasks #167 have been stuck for a while and you are correct that this was due to a time crunch that meant the science part of the workflow I developed and the backend stuff Simon made was quickly pushed together (hence random name changes between bespoke and beflow)!
These mocked tests are something Simon was using a lot as many functions in bespokefit take a lot of time or rely on calling out to QCArchive which can be unreliable, I mostly understand them now and can write some doc strings to explain at least what I think is going on if that would help? I also agree that the use of mocked tests is inconsistent and it would be great to come up with a pattern they should follow that we could apply to them all!
This has also confused me, and the issue is made more complex by the flexibility I have tried to offer in the workflow. The way I have it is that logic should go into its own module where possible as is the case for the optimizers such as ForceBalance which have the However, the fragmentation stage is more complicated than this as @mattwthompson points out above what if the user does not want to use fragmentation? So we made this stage optional allowing users to set it to Maybe now we have more time we could come up with a better method maybe some schema Stages like the
Yes changing these seems fine to me if it helps make it easier to understand and use, I think aiming to simplify the construction of the I think one big issue is how interlinked settings are for example the Overall we tried a few iterations of the layout and got this one worked but its probably not optimal and I would be interested to hear thoughts on how we could simplify things and make it easier to extend as I have wanted to add other optimisers for a while now but know its a big task... I agree with the rest of your points and would love to see it become easier to add new features as I have plenty of ideas some even made it to PRs #167 but it takes far too long to get them over the line. |
An example of why we might want to avoid mocking |
Thanks for your inputs! Sorry I missed that you'd replied for so long.
I'm into that! I guess I blur the line between regression tests and the others a bit; if there's a bug that's missed by the tests, that seems to me to also be a bug in the existing tests, so the appropriate regression test is just a unit (or integration I guess) test of that behavior.
Yeah if we can find a way to test BespokeFit without mocking, I would love that - but it probably means essentially writing a test suite for the inputs we provide to other software. That would probably pay off, but it seems like an enormous amount of work given our current approach of "forward as many options on to our dependencies as possible".
I'd like to at least see a description of the behavior the test is intended to check. If that description fits into the name of the function, that's fine, but I think that either means that it's a good unit test with a focused goal (yay!) or that the description is vague and the test covers a lot of functionality (like
I guess we'll just have to rename the configuration options then! Big breaking change coming up, I sense.
THAT WOULD BE AWESOME!!!
This sounds like a problem with
But the API doesn't actually support this at the moment, right? My gut thinking is that we shouldn't compromise making the codebase easily understandable for a feature that we might develop in the future.
I'm not sure there's an API to actually make use of it, and I might be misunderstanding things... but does #76 support essentially arbitrary stage orderings? Each task could be its own QCGen stage. Then we'd get all of that for free right? And #167 would just be adding an optional second QCGen stage.
Ok cool this is great info. In that case I think we should make (most of?) the rest of the API private and we can just create a user friendly front-facing Pydantic model with the design we want, that gets converted to the existing (private) configuration code behind the scenes. I guess that's basically what
Yeah agreed, as I mention with Everything basically comes back around to targets, right? You could basically express the whole configuration as just a list of targets, and then each target defines what part of the molecule it works on, how to fragment it, what QCGen to do on it, and how to optimize it. When the workflow factory is applied to a particular molecule, this all could get "compiled" into the stage model we have at the moment. And we could have other configuration classes that inherit from it, like say Or something. It feels like the sort of thing we would want to experiment with for a bit, which I now realise we can do by adding additional experimental configuration classes. So I guess I'll get around to experimenting with that and finding out how hard it is at some point in the future! |
Thanks for sharing this @Yoshanuikabundi! I'm new to Bespokefit and still learning the API. Your analysis provides some nice context by discussing strengths, opportunities for improvement, and potential future directions. I'm chiming in because I'm starting to work on a master's program capstone project that is focused on integrating re-parameterization into the Material Project HMD workflow stack. My advisor has decided that Bespokefit is a good option and we'd like to pursue incorporating it into the workflow. There are a couple of approaches that we are considering taking. The first is to see if we can use the Bespokefit API directly or create a wrapper to encapsulate functionality. The second option was to explore the possibility of seeing if Bespokefit could be configured to interface with pymatgen and atomate for the QC generation step of the workflow. You mentioned there are other places in the codebase that could be refactored. Has there been any thought on being able to create interfaces for QC generation? |
BespokeFit itself calls QCEngine to handle the heavy lifting of QC calculations. If you are looking to interface with QC directly you may want to look into what QCEngine handles and use it directly; it would make more sense to wrap BespokeFit if you have a whole end-to-end pipeline in mind. |
I'm opening this issue to lay out what I see as the challenges for BespokeFit and hopefully get some advice on clearing them up and planning the next several releases.
I think the next release is uncontroversial, as we've already had one crack at it: 0.2.0, which introduces support for versions of the Toolkit since 0.11. I think we want to release this as soon as we think it works, which is presumably either when ForceBalance makes a release fixing OpenFF interop, or when OpenFF ForceBalance makes its first release. I'd like to discuss what happens after that.
Testing - Making BespokeFit easier to develop
BespokeFit is an impressive piece of code, but I think we owe some substantial technical debt on it. Significant PRs are commonly bogged down with test failures that we can't diagnose and end up working around, fits fail even when unit tests pass, and changes that feel simple end up complicated. I've discussed integration tests elsewhere (#218), which are relatively simple to implement and will provide a huge improvement. However, if we want to be able to continue to confidently release new features for BespokeFit, I think we also need to rethink how we're doing unit tests.
Documenting existing tests
Only about a quarter to a third of our tests have docstrings. BespokeFit's tests are pretty difficult to interpret by just reading the code. There's an average of about two comment lines every three tests, tests often include unfamiliar (at least to me) things like HTTP post requests, and they require a lot of knowledge about the structure of the software which is also undocumented - more on that later. It'd be great to get at least single-line docstrings for all tests, and detailed docstrings for mock-ups and fixtures that are used repeatedly in tests.
Enhanced mock-ups for external functions
Unit tests for BespokeFit are complicated by the need to call out to other software, which must then do complex and time-consuming calculations before returning. A unit test (as I understand it) is supposed to be a quick-but-comprehensive check of a small, isolated part of the codebase; if all the unit tests pass, then all the underlying functionality works, and so the code in aggregate should work as specified by the unit tests. When a single run of a function can take minutes, it becomes impossible to test things comprehensively in a practicably short time period. And even if it were fast, our unit test basically becomes an integration test for someone else's project.
At the moment, this is worked around by mocking all the expensive calculations. Instead of calling out to ForceBalance for instance, we monkey patch
subprocess.run()
to returnNone
. I think this is the right approach, but it could be improved by investing some time in making mock-ups that confirm that the inputs they are provided are what we expect, and then produce credible output. Some of our existing mock-ups do this, but most seem to do only one, the other, or neither. I think this might lead our tests to tend to answer the question, "does this invocation of this function work", rather than "do all invocations of this function have the expected behavior".Refactoring
I have a number of ideas for refactors to set up clearer responsibilities for different parts of the code base and generally flatten the call graph.
There might be good reasons that the code base is the way it is, but I think it might just be the result of time crunch to get the code ready for publication. But if any of this sounds like a bad idea, please let me know!
Moving logic out of workers
This might be me just not understanding the logic behind how the code is laid out, but I don't understand when code should be put in a worker (eg
openff/bespokefit/executor/services/optimizer/worker.py
) vs when it should be put in its own module (egopenff/bespokefit/optimizers
). It's clear to me that the worker provides an opportunity for code that runs regardless of which optimizer is being used, but the optimizers module also provides this opportunity, both in the base model and in plain functions in thebase
module.What is clear to me is that when something goes wrong, or when I want to add a feature, or I want to understand how a configuration option works, at the moment I have to check at least two places: the worker and the model. I remember this meant an awful lot of jumping from source file to source file and two or three prototype fixes in completely different parts of the codebase when I wrote #193.
I'd like to move as much logic out of the workers as possible. Ideally, I'd like the different workers to share most of their code, and just consist of shared logic to:
And for all other logic to happen within the step module's compute function. This'll flatten the module hierarchy a lot, and move all optimizer logic to the
optimizers
module, fragmentation logic to thefragmentation
module, and so forth. It'll hopefully also simplify testing, as the workers can be tested in aggregate and the logic for each step will be co-located.I think similar refactors may be possible in other parts of the codebase, but I haven't found any yet!
Simplification of configuration schemas
BespokeWorkflowFactory
is a complex class whose philosophy seems to have wandered over development. It is complex by nature and by design, and that's fine, but I think we can unify the philosophy to make it a little less complicated.At the moment, I want to make the following changes in this and other schemas:
BespokeWorkflowFactory
is a factory forBespokeOptimizationSchema
. I'd like to make it a factory forBespokeWorkflow
. This would unify naming in the codebase with that in the documentation.smirk_settings
and the associated class name,parameter_hyperparameters
->target_hyperparameters
,target_torsion_smirks
->target_schema.torsion_smirks
, etc.None
s that are set to a default value on workflow creation with that default value (egBaseTargetSchema.calculation_specification
) - this is essential for good API documentation.BaseTargetSchema.reference_data
, which changes a lot at the moment, to be eitherNone
or some actual data -calculation_specification
is right there!default_qc_specs
take a single item rather than a listIn general, I would like
BaseTargetSchema.reference_data
. Initializing toNone
doesn't count as setting, so new information can be added to a schema after initialization - this should be the only use of optional fields in user-facing schemas.New user-facing documentation
Docstrings for all modules, classes and functions would be great, as would descriptions for all environment variables.
Also, does anyone know why the environment variables are prefaced with
BEFLOW_
and notBESPOKEFIT_
or something?Better debugging
Debugging BespokeFit is a pain. I have two changes in mind that should help this:
How to stage all this
I think 0.2.0 comes first. Then I think we at least need integration tests that check scientific validity before we can attempt major refactors, and improved debugging would be a big help. But apart from that I have no concrete plans yet.
I also have no plans for new features. But I think the above will make BespokeFit much easier to use and develop and provide a good base for new features. At the moment, the idea of adding features is... intimidating.
Dear reader, whether you got this far or you just skipped to the end, I would very much appreciate your thoughts!
The text was updated successfully, but these errors were encountered: