Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove GPU options from the Python workflow and move them to XML files for CESM #4690

Merged
merged 13 commits into from
Oct 7, 2024

Conversation

sjsprecious
Copy link
Collaborator

@sjsprecious sjsprecious commented Oct 4, 2024

As discussed in #4687, it is better remove the GPU options from the Python workflow in CIME and use XML files instead to configure them for CESM.

This PR does this and it works for CESM CPU & GPU runs according to my tests on Derecho.

The general build procedure for building a CESM GPU run on Derecho now looks like (the other machines can modify the settings for their specific hardware accordingly):

1. create a new case as normal
2. go to the case directory
3. ./xmlchange GPU_TYPE=a100
4. ./xmlchange OPENACC_GPU_OFFLOAD=TRUE
5. ./xmlchange OVERSUBSCRIBE_GPU=TRUE
6. ./xmlchange NGPUS_PER_NODE=4
7. ./case.setup
8. ./case.build

It should work together with the following PRs:

@jedwards4b
Copy link
Contributor

pylint errors to correct:

************* Module CIME.XML.env_mach_pes
CIME/XML/env_mach_pes.py:47:8: W0613: Unused argument 'oversubscribe_gpu' (unused-argument)
************* Module CIME.case.case_setup
CIME/case/case_setup.py:415:24: W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)
CIME/case/case_setup.py:427:20: W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b . I have fixed the pylint error. But it looks like there are still two errors left:

Trim Trailing Whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted CIME/case/case_setup.py

Could you tell me how to fix these errors?

@jedwards4b
Copy link
Contributor

The easiest way is to install pre-commit and use it locally.
You can install it from pip, or use mine: /glade/u/home/jedwards/.local/bin/pre-commit
from your cime directory: pre-commit run
This should reformat case_setup.py then you can commit again.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b . I ran the pre-commit run from my cime directory and here is what I saw:

[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
check xml............................................(no files to check)Skipped
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
debug statements (python)............................(no files to check)Skipped
black................................................(no files to check)Skipped
pylint...............................................(no files to check)Skipped

It seems that nothing is fixed somehow. I am using pre-commit 4.0.0. Should I try another specific version instead?

@jasonb5
Copy link
Collaborator

jasonb5 commented Oct 7, 2024

@sjsprecious Run pre-commit run -a, this should check all the files and fix any issues.

@sjsprecious
Copy link
Collaborator Author

Thanks @jasonb5 . I have fixed them with Jim's help.

@jedwards4b jedwards4b merged commit 62a9b17 into ESMCI:master Oct 7, 2024
7 checks passed
@sjsprecious sjsprecious deleted the use_xml_for_gpu_config branch October 7, 2024 19:16
@jedwards4b
Copy link
Contributor

@sjsprecious I noticed that you only tested this for e3sm - the cesm version of the config_machines xsd file has not been updated to reflect the change.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b for catching this bug. I will work on a PR to fix it for CESM shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants