-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralize config defaults, add safety for Config updates #68
Conversation
mtauraso
commented
Sep 19, 2024
•
edited
Loading
edited
- All configuration dicts are now ConfigDict. A ConfigDict is a version of dict that raises RuntimeError quite a lot when you try to mutate it or when you access values that weren't set at construction time.
- There is a new validate step for runtime config that ensures every config key that appears at runtime has a default defined in fibad_default_config.toml.
- Usages of .get() on config dictionaries have been removed, so all defaults now live in fibad_default_config.toml. Config access is entirely via getitem / [].
- The sum total of the above means that we expect any new config which is missing a default value to generate a runtime error as early as possible if it is referenced anywhere in code, or in a user-supplied configuration.
- Several config values have changed representation, because "not specified" was their default. toml does not support "None" so all of these values have been set to false in fibad_default_config.toml. Accesses to these configs have been updated to be aware that the default value of 'false' either triggers modal behavior (data_loader/crop_to, data_loader/filters) or triggers a RuntimeError to prompt the user to provide it (download/username, download/password).
- Except where necessary for model/dataloader class loading, passsing of individual config sections has been removed. For the most part all parts of the code now have access to the entire runtime configuration.
- The data_loader/data and download/cutout_dir configs have been combined into general/data_dir, to avoid an invalid configuration state where the downloader and data loader are looking at different directories for data.
- All configuration dicts are now ConfigDict. A ConfigDict is a version of dict that raises RuntimeError quite a lot when you try to mutate it or when you access values that weren't set at construction time. - There is a new validate step for runtime config that ensures every config key that appears at runtime has a default defined in fibad_default_config.toml. - Usages of .get() on config dictionaries have been removed, so all defaults now live in fibad_default_config.toml. Config access is entirely via __getitem__ / []. - The sum total of the above means that we expect any new config which is missing a default value to generate a runtime error as early as possible if it is referenced anywhere in code, or in a user-supplied configuration. - Several config values have changed representation, because "not specified" was their default. toml does not support "None" so all of these values have been set to false in fibad_default_config.toml accesses to these configs have been updated to be aware that the default value of 'false' either triggers modal behavior (data_loader/crop_to, data_loader/filters) or triggers a RuntimeError to prompt the user to provide it (download/username, download/password). - Except where necessary for model/dataloader class loading, passsing of individual config sections has been removed. For the most part all parts of the code have access to all configs. - The data_loader/data and download/cutout_dir configs have been combined into general/data_dir, to avoid an invalid configuration state where the downloader and data loader are looking at different directories for data.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #68 +/- ##
==========================================
- Coverage 37.73% 37.59% -0.15%
==========================================
Files 17 17
Lines 734 798 +64
==========================================
+ Hits 277 300 +23
- Misses 457 498 +41 ☔ View full report in Codecov by Sentry. |
src/fibad/download.py
Outdated
iterator = iter(iterable) | ||
while batch := tuple(itertools.islice(iterator, n)): | ||
yield batch | ||
|
||
logger.info("Dividing cutouts among threads...") | ||
num_threads = num_threads if len(self.rects) > num_threads else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a little log message here in the case that num_threads is greater than self.rects? i.e. let the user know that even though they asked for N threads, we're only using 1. I could see how the following log might be confusing otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that seems worth an info log at least.
msg += f"All configuration keys and sections must be defined in {DEFAULT_CONFIG_FILEPATH}" | ||
raise RuntimeError(msg) | ||
|
||
if isinstance(runtime_config[key], dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be worth validating that if runtime_config[key] is a dict, that default_config[key] is as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is reasonable paranoia.
# percolates down to nested ConfigDicts and prevents __setitem__ and other mutations of dictionary | ||
# values? i.e. a method to make a config dictionary fully immutable (or very difficult/annoying to | ||
# mutuate) before we pass control to possibly external module code that is relying on the dictionary | ||
# to be static througout the run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems like a really good idea. It will be interesting to see, when we have an actual user(s) interacting with Fibad if there is a desire to modify the merged config prior to running train or predict. But I agree, that during runtime, the config should be/becomes immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I do feel like its complicated enough that we should hold off for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.