Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp job_id, take 2 #2994

Merged
merged 1 commit into from
May 9, 2024
Merged

Revamp job_id, take 2 #2994

merged 1 commit into from
May 9, 2024

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented May 8, 2024

Attempt to close #2651.

This PR:

  • Merges config/default_configs/default_edmf_config.yml into config/default_configs/default_config.yml, so that there is a single default (and still over-ridable) config yaml.
  • Removes job_id from the toml files
  • job_id was added to the parsed args, so that we can fully control the output folder in the buildkite pipeline. The attempt to guess the job ID in Revamp job IDs #2986 still required additional special casing (the same config_id and resources could be used for a driver example and a performance script, resulting in job_id collisions).
  • Added config_files to AtmosConfig. config_files is now required for all jobs, and has the same fallback as before. The main difference is that we can inspect (::AtmosConfig).config_files to see where the parameters came from.
  • I've made config_path and default_config_file constants, so that they can be reused in a few places
  • configs_per_job_id was renamed to configs_per_config_id, and the values of that returned dict is now a NamedTuple: (; config, config_file), so that, again, we can trace where parameters are coming from

To make a config, the driver now uses:

    (; config_file, job_id) = CA.commandline_kwargs()
    config = CA.AtmosConfig(config_file; job_id)

I think that explicitly specifying the config improves the clarity. If those arguments are not passed, we have a fallback to use config_file::String = default_config_file and job_id = config_id_from_config_file(config_file).

This is technically breaking because CA.AtmosConfig() no longer does arg parsing (and depending on what was available would use given or the default config), which I think was very confusing. Now, that logic is isolated to (; config_file, job_id) = CA.commandline_kwargs(), which simply has defaults.

I've not yet removed the duplicate yaml files, but I think this should make doing so much easier.

Supersedes #2986, #2981

@charleskawczynski charleskawczynski force-pushed the ck/revamp_job_ids_take2 branch 5 times, most recently from 1815d8b to 299418b Compare May 9, 2024 13:58
@charleskawczynski charleskawczynski force-pushed the ck/revamp_job_ids_take2 branch 2 times, most recently from f60dc57 to 5a3b129 Compare May 9, 2024 14:58
@charleskawczynski charleskawczynski marked this pull request as ready for review May 9, 2024 14:58
@charleskawczynski charleskawczynski force-pushed the ck/revamp_job_ids_take2 branch 2 times, most recently from f55d244 to 3a6f82d Compare May 9, 2024 17:01
Copy link
Member

@nefrathenrici nefrathenrici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps to clean up the config issues, separating job_id from the config is much nicer.
I left some nitpicking comments

src/solver/types.jl Show resolved Hide resolved
src/solver/type_getters.jl Outdated Show resolved Hide resolved
config/perf_configs/checkbounds.yml Outdated Show resolved Hide resolved
@charleskawczynski charleskawczynski force-pushed the ck/revamp_job_ids_take2 branch 2 times, most recently from 328568a to 23591a7 Compare May 9, 2024 20:19
Merge core and edmf default configs

Cleanup flame script

Use --config_file kwarg in more pipeline jobs

configs_per_job_id -> configs_per_config_id

Use job_id

Add commandline convenience funcs

Fixes

config_file -> config_files

Update news

Temporarily disable generate_output_path

Improve log / printing

Doc fixes
@charleskawczynski
Copy link
Member Author

CI passed in https://buildkite.com/clima/climaatmos-ci/builds/18763. A gpu node is not working on central, so I'm going to manually merge.

Thanks for taking a look, @nefrathenrici.

@charleskawczynski charleskawczynski merged commit e0fe1cd into main May 9, 2024
7 of 8 checks passed
@charleskawczynski charleskawczynski deleted the ck/revamp_job_ids_take2 branch May 9, 2024 20:52
@szy21
Copy link
Member

szy21 commented May 9, 2024

Should we change the pipeline in longruns, longruns_gpu, and gpu_pipeline as well?

@charleskawczynski
Copy link
Member Author

Yes, I'll open a followup PR.

@Sbozzolo
Copy link
Member

While a good step forward, I feel that the problems I raised in the review of the first iteration (#2986) were not fully addressed.

  • Importantly, this PR adds another layer of complexity and indirection to set up a simulation. For example, one has to know that the config_id is obtained from the name of a yml file, and renaming files will break workflows that rely on config_id. Then, one has to know what a config_id does.
  • The PR seem to also introduce a notion of config_id along with job_id without defining the respective purposes.
  • The PR also mixes command line arguments with yml configurations, so that it is no longer possible to fully specify a simulation just using yml files (or, this is done implicitely through the file name).
  • Conversely, the PR solidifies the idea that config files are needed to use ClimaAtmos, which, in my opinion, goes against the idea that ClimaAtmos is "a library for building atmospheric circulation models" (one should be able to set up a config how they see fit, e.g., with dictionaries instead of files).
  • The PR exposes job_id so that we can control the output_dir for our CI, so why not acting directly with the output_dir instead of setting it in such a roundabout way?

Fundamentally, I think that our work on Atmos interface should move us in a direction where things are less entangled and where it is easier to understand what is happening.

@charleskawczynski
Copy link
Member Author

Hi @Sbozzolo,

  • Importantly, this PR adds another layer of complexity and indirection to set up a simulation. For example, one has to know that the config_id is obtained from the name of a yml file, and renaming files will break workflows that rely on config_id. Then, one has to know what a config_id does.

I don't think an (optional) extra function call is adding too much complexity to set up a simulation. Also, users don't need to know what the config_id, I simply changed some job_ids to the name config_id, as they seemed more appropriate in certain contexts. Feel free to open an issue/PR if you'd like to bikeshed this more. Also, I don't think users need to know what config_id does, as it's not specified in the config file.

  • The PR seem to also introduce a notion of config_id along with job_id without defining the respective purposes.

Sure, we can add a doc string to config_id, to explain where it's used.

  • The PR also mixes command line arguments with yml configurations, so that it is no longer possible to fully specify a simulation just using yml files (or, this is done implicitely through the file name).
  • Conversely, the PR solidifies the idea that config files are needed to use ClimaAtmos, which, in my opinion, goes against the idea that ClimaAtmos is "a library for building atmospheric circulation models" (one should be able to set up a config how they see fit, e.g., with dictionaries instead of files).

I think you may be misunderstanding how things worked, and how things work with this PR. If you'd like, we can chat offline about this. The bottom line is:

  • This PR decouples command line arguments with yaml configurations, which I mentioned in the PR title
  • Previously, AtmosConfig would parse the command line and grab default yaml configurations
  • Now, AtmosConfig takes an optional yaml config, if not it uses the default (as it did before), but it no longer parses CL args.
  • Now, all CL parsing occurs in commandline_kwargs
  • This PR does not solidify the idea that config files are needed in Atmos anymore than the main branch did. Also, there technical has never been such a barrier, since AtmosConfig does not have an inner constructor that requires config files to be passed. Users can always bypass that, it's just that the convenience functions we offer are much more convenient to use compared to users manually specifying everything in the config file.
  • The PR exposes job_id so that we can control the output_dir for our CI, so why not acting directly with the output_dir instead of setting it in such a roundabout way?

This PR did not make changes w.r.t. the relation between job_id and output_dir, we've always had job_ids, they were simply specified differently. Please keep review comments relevant to the PR.

Fundamentally, I think that our work on Atmos interface should move us in a direction where things are less entangled and where it is easier to understand what is happening.

That's fair, and I'm open to further refactoring but we should open concrete issues about what the requirements are, and design decisions we want to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPU + GPU runs can't share yml files
4 participants