Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize config defaults, add safety for Config updates #68

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

mtauraso
Copy link
Collaborator

@mtauraso mtauraso commented Sep 19, 2024

  • All configuration dicts are now ConfigDict. A ConfigDict is a version of dict that raises RuntimeError quite a lot when you try to mutate it or when you access values that weren't set at construction time.
  • There is a new validate step for runtime config that ensures every config key that appears at runtime has a default defined in fibad_default_config.toml.
  • Usages of .get() on config dictionaries have been removed, so all defaults now live in fibad_default_config.toml. Config access is entirely via getitem / [].
  • The sum total of the above means that we expect any new config which is missing a default value to generate a runtime error as early as possible if it is referenced anywhere in code, or in a user-supplied configuration.
  • Several config values have changed representation, because "not specified" was their default. toml does not support "None" so all of these values have been set to false in fibad_default_config.toml. Accesses to these configs have been updated to be aware that the default value of 'false' either triggers modal behavior (data_loader/crop_to, data_loader/filters) or triggers a RuntimeError to prompt the user to provide it (download/username, download/password).
  • Except where necessary for model/dataloader class loading, passsing of individual config sections has been removed. For the most part all parts of the code now have access to the entire runtime configuration.
  • The data_loader/data and download/cutout_dir configs have been combined into general/data_dir, to avoid an invalid configuration state where the downloader and data loader are looking at different directories for data.

- All configuration dicts are now ConfigDict. A ConfigDict is a version of
  dict that raises RuntimeError quite a lot when you try to mutate it
  or when you access values that weren't set at construction time.
- There is a new validate step for runtime config that ensures
  every config key that appears at runtime has a default defined in
  fibad_default_config.toml.
- Usages of .get() on config dictionaries have been removed, so all
  defaults now live in fibad_default_config.toml. Config access is entirely
  via __getitem__ / [].
- The sum total of the above means that we expect any new config which is
  missing a default value to generate a runtime error as early as possible
  if it is referenced anywhere in code, or in a user-supplied configuration.
- Several config values have changed representation, because
  "not specified" was their default. toml does not support "None" so
  all of these values have been set to false in fibad_default_config.toml
  accesses to these configs have been updated to be aware that the default
  value of 'false' either triggers modal behavior (data_loader/crop_to,
  data_loader/filters) or triggers a RuntimeError to prompt the user to
  provide it (download/username, download/password).
- Except where necessary for model/dataloader class loading, passsing
  of individual config sections has been removed. For the most part
  all parts of the code have access to all configs.
- The data_loader/data and download/cutout_dir configs have been combined
  into general/data_dir, to avoid an invalid configuration state where
  the downloader and data loader are looking at different directories
  for data.
@mtauraso mtauraso self-assigned this Sep 19, 2024
This was linked to issues Sep 19, 2024
@mtauraso mtauraso requested a review from drewoldag September 19, 2024 19:33
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 36.03604% with 71 lines in your changes missing coverage. Please review.

Project coverage is 37.59%. Comparing base (a723812) to head (aad0186).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/fibad/config_utils.py 54.09% 28 Missing ⚠️
src/fibad/download.py 0.00% 23 Missing ⚠️
src/fibad/fibad.py 16.66% 5 Missing ⚠️
src/fibad/models/example_autoencoder.py 20.00% 4 Missing ⚠️
src/fibad/train.py 0.00% 4 Missing ⚠️
src/fibad/data_loaders/hsc_data_loader.py 25.00% 3 Missing ⚠️
src/fibad/models/example_cnn_classifier.py 33.33% 2 Missing ⚠️
src/fibad/data_loaders/data_loader_registry.py 0.00% 1 Missing ⚠️
...rc/fibad/data_loaders/example_cifar_data_loader.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #68      +/-   ##
==========================================
- Coverage   37.73%   37.59%   -0.15%     
==========================================
  Files          17       17              
  Lines         734      798      +64     
==========================================
+ Hits          277      300      +23     
- Misses        457      498      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Sep 19, 2024

Before [a723812] After [369f35d] Ratio Benchmark (Parameter)
2.98±1s 1.83±0.8s ~0.61 benchmarks.time_computation
3.87k 1.69k 0.44 benchmarks.mem_list

Click here to view all benchmarks.

iterator = iter(iterable)
while batch := tuple(itertools.islice(iterator, n)):
yield batch

logger.info("Dividing cutouts among threads...")
num_threads = num_threads if len(self.rects) > num_threads else 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a little log message here in the case that num_threads is greater than self.rects? i.e. let the user know that even though they asked for N threads, we're only using 1. I could see how the following log might be confusing otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems worth an info log at least.

msg += f"All configuration keys and sections must be defined in {DEFAULT_CONFIG_FILEPATH}"
raise RuntimeError(msg)

if isinstance(runtime_config[key], dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be worth validating that if runtime_config[key] is a dict, that default_config[key] is as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is reasonable paranoia.

# percolates down to nested ConfigDicts and prevents __setitem__ and other mutations of dictionary
# values? i.e. a method to make a config dictionary fully immutable (or very difficult/annoying to
# mutuate) before we pass control to possibly external module code that is relying on the dictionary
# to be static througout the run.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems like a really good idea. It will be interesting to see, when we have an actual user(s) interacting with Fibad if there is a desire to modify the merged config prior to running train or predict. But I agree, that during runtime, the config should be/becomes immutable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I do feel like its complicated enough that we should hold off for now.

Copy link
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@mtauraso mtauraso merged commit a8ef50f into main Sep 20, 2024
7 checks passed
@mtauraso mtauraso deleted the issue/50/config-defaults branch September 20, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pass the whole config everywhere Centralize config defaults
2 participants