Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved dataset control #327

Merged

Conversation

chrisiacovella
Copy link
Member

@chrisiacovella chrisiacovella commented Dec 3, 2024

Pull Request Summary

This provides changes aimed at improving the control over datasets. Specifically, the switch to .toml files meant we could not easily toggle which properties to load from the hdf5 files nor which properties to assign to the key internal variables (energy, force, dipole_moment, etc.).

This updates the pydantic dataset model whereby users now additionally define properties of interest (as a list) and then the association of these properties to the internal variable names (i.e used by the Properties class). The pydantic model validates that anything in the Properties association also exists in the properties_of_interest list. We do not validate that the properties_of_interest are actually in the hdf5 file via the pydantic model; this is handled when we assign those parameters while calling the DataModule class. Since this is one of the earliest steps in the process, failure to properly define these should lead to a rapid failure. this saves considerable extra code in the pydantic model, and having to make separate models for each dataset.

Key changes

Notable points that this PR has either accomplished or will accomplish.

  • Allow better control over datasets via .toml files
  • Fix buffer issue when logging to wandb
  • Fix logic whereby using dipole moment in the loss required total charge as well (since dipole moment requires charges. This is fixed such that charges are calculated if dipole moment is requested, even if total charge is not in the loss).
  • bug fix to spice1_openff curation (wrong tag for scf_dipole).
  • change backend for matplotlib to prevent issues when using debugger

Associated Issue(s)

Pull Request Checklist

  • Issue(s) raised/addressed and linked
  • Includes appropriate unit test(s)
  • Appropriate docstring(s) added/updated
  • Appropriate .rst doc file(s) added/updated
  • PR is ready for review

@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 92.42424% with 5 lines in your changes missing coverage. Please review.

Project coverage is 78.83%. Comparing base (f8c7834) to head (3ac9e6c).

Additional details and impacted files

modelforge/dataset/parameters.py Show resolved Hide resolved
modelforge/dataset/parameters.py Show resolved Hide resolved
modelforge/tests/data/config.toml Show resolved Hide resolved
modelforge/tests/data/config.toml Show resolved Hide resolved
modelforge/train/training.py Show resolved Hide resolved
modelforge/train/training.py Outdated Show resolved Hide resolved
modelforge/train/training.py Show resolved Hide resolved
@chrisiacovella chrisiacovella merged commit f8503c2 into choderalab:main Dec 5, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants