-
Notifications
You must be signed in to change notification settings - Fork 249
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* merge gefs-hrrr model * update gefs-hrrr * Update gefs_hrrr.py * Update gefs_hrrr.yaml edit stats path * Update train.py * Update gefs_hrrr.py * Add unit test for GEFS Corrdiff regression loss and lead-time aware songunet * Format so init prob_channel scalar factor by channel length * Add docstrings and license for dataloader, formating in general * Delete examples/generative/corrdiff/stats.json * Update loss.py * Update song_unet.py * Update utils.py * Update README.md * Update README.md * Update CHANGELOG.md * Fixing generalization to unit test case for corrdiff utils * Update unit test for fixing diffusion step signature * Reformat with black --------- Co-authored-by: Tao Ge <[email protected]> Co-authored-by: Tao Ge <[email protected]>
- Loading branch information
1 parent
f46e25f
commit 297297e
Showing
30 changed files
with
1,993 additions
and
112 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
36 changes: 36 additions & 0 deletions
36
examples/generative/corrdiff/conf/config_generate_gefs_hrrr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
hydra: | ||
job: | ||
chdir: true | ||
name: gefs_hrrr_generation | ||
run: | ||
dir: output/${hydra:job.name} | ||
|
||
# Get defaults | ||
defaults: | ||
|
||
# Dataset | ||
- dataset/gefs_hrrr | ||
|
||
# Sampler | ||
- sampler/stochastic | ||
#- sampler/deterministic | ||
|
||
# Generation | ||
- generation/patched_based_gefs_hrrr | ||
#- generation/patched_based |
34 changes: 34 additions & 0 deletions
34
examples/generative/corrdiff/conf/config_training_gefs_diffusion.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
hydra: | ||
job: | ||
chdir: true | ||
name: gefs_hrrr_diffusion | ||
run: | ||
dir: ./outputs/${hydra:job.name} | ||
|
||
# Get defaults | ||
defaults: | ||
|
||
# Dataset | ||
- dataset/gefs_hrrr | ||
|
||
# Model | ||
- model/corrdiff_patched_diffusion_gefs_hrrr | ||
|
||
# Training | ||
- training/corrdiff_patched_diffusion_gefs_hrrr |
34 changes: 34 additions & 0 deletions
34
examples/generative/corrdiff/conf/config_training_gefs_regression.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
hydra: | ||
job: | ||
chdir: true | ||
name: gefs_hrrr_regression | ||
run: | ||
dir: ./outputs/${hydra:job.name} | ||
|
||
# Get defaults | ||
defaults: | ||
|
||
# Dataset | ||
- dataset/gefs_hrrr | ||
|
||
# Model | ||
- model/corrdiff_regression_gefs_hrrr | ||
|
||
# Training | ||
- training/corrdiff_regression_gefs_hrrr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
type: gefs_hrrr | ||
data_path: /data | ||
stats_path: modulus/examples/generative/corrdiff/stats.json | ||
output_variables: ["u10m", "v10m", "t2m", "precip", "cat_snow", "cat_ice", "cat_freez", "cat_rain", "cat_none"] | ||
prob_variables: ["cat_snow", "cat_ice", "cat_freez", "cat_rain"] | ||
input_surface_variables: ["u10m", "v10m", "t2m", "q2m", "sp", "msl", "precipitable_water"] | ||
input_isobaric_variables: ['u1000', 'u925', 'u850', 'u700', 'u500', 'u250', 'v1000', 'v925', 'v850', 'v700', 'v500', 'v250', 'z1000', 'z925', 'z850', 'z700', 'z500', 'z200', 't1000', 't925', 't850', 't700', 't500', 't100', 'r1000', 'r925', 'r850', 'r700', 'r500', 'r100'] | ||
ds_factor: 4 | ||
train: False | ||
hrrr_window: [[1,1057], [4,1796]] # need dims to be divisible by 16 [[0,1024], [0,1024]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
68 changes: 68 additions & 0 deletions
68
examples/generative/corrdiff/conf/generation/patched_based_gefs_hrrr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
num_ensembles: 1 | ||
# Number of ensembles to generate per input | ||
seed_batch_size: 1 | ||
# Size of the batched inference | ||
inference_mode: all | ||
# Choose between "all" (regression + diffusion), "regression" or "diffusion" | ||
patch_size: 448 | ||
patch_shape_x: 448 | ||
patch_shape_y: 448 | ||
# Patch size. Patch-based sampling will be utilized if these dimensions differ from | ||
# img_shape_x and img_shape_y | ||
overlap_pixels: 4 | ||
# Number of overlapping pixels between adjacent patches | ||
boundary_pixels: 2 | ||
# Number of boundary pixels to be cropped out. 2 is recommanded to address the boundary | ||
# artifact. | ||
hr_mean_conditioning: true | ||
gridtype: learnable | ||
N_grid_channels: 100 | ||
sample_res: full | ||
# Sampling resolution | ||
times_range: null | ||
times: | ||
- "2024011212f00" | ||
- "2024011212f03" | ||
- "2024011212f06" | ||
- "2024011212f09" | ||
- "2024011212f12" | ||
- "2024011212f15" | ||
- "2024011212f18" | ||
- "2024011212f21" | ||
- "2024011212f24" | ||
|
||
has_lead_time: true | ||
|
||
perf: | ||
force_fp16: false | ||
# Whether to force fp16 precision for the model. If false, it'll use the precision | ||
# specified upon training. | ||
use_torch_compile: false | ||
# whether to use torch.compile on the diffusion model | ||
# this will make the first time stamp generation very slow due to compilation overheads | ||
# but will significantly speed up subsequent inference runs | ||
num_writer_workers: 1 | ||
# number of workers to use for writing file | ||
# To support multiple workers a threadsafe version of the netCDF library must be used | ||
|
||
io: | ||
res_ckpt_filename: EDMPrecondSRV2_updated.0.5821440.mdlus | ||
# Checkpoint filename for the diffusion model | ||
reg_ckpt_filename: UNet_updated.0.1960960.mdlus | ||
# Checkpoint filename for the mean predictor model |
24 changes: 24 additions & 0 deletions
24
examples/generative/corrdiff/conf/model/corrdiff_patched_diffusion_gefs_hrrr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
name: lt_aware_patched_diffusion | ||
# Name of the preconditioner | ||
hr_mean_conditioning: True | ||
# High-res mean (regression's output) as additional condition | ||
scale_cond_input: True | ||
# If true, also scales the input conditioning | ||
# For backward compatibility, this is true by default | ||
# We recommend setting this to false for new training runs |
21 changes: 21 additions & 0 deletions
21
examples/generative/corrdiff/conf/model/corrdiff_regression_gefs_hrrr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
name: lt_aware_ce_regression | ||
# Name of the preconditioner | ||
hr_mean_conditioning: False | ||
# High-res mean (regression's output) as additional condition | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
60 changes: 60 additions & 0 deletions
60
examples/generative/corrdiff/conf/training/corrdiff_patched_diffusion_gefs_hrrr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# Hyperparameters | ||
hp: | ||
training_duration: 200000 | ||
# Training duration based on the number of processed images, measured in kilo images (thousands of images) | ||
total_batch_size: 1 | ||
# Total batch size | ||
batch_size_per_gpu: 1 | ||
# Batch size per GPU | ||
lr: 0.0002 | ||
# Learning rate | ||
grad_clip_threshold: 1e6 | ||
# no gradient clipping for defualt non-patch-based training | ||
lr_decay: 0.7 | ||
# LR decay rate | ||
patch_shape_x: 448 | ||
patch_shape_y: 448 | ||
# Patch size. Patch training is used if these dimensions differ from img_shape_x and img_shape_y | ||
patch_num: 4 | ||
# Number of patches from a single sample. Total number of patches is patch_num * batch_size_global | ||
lr_rampup: 1000000 | ||
# Rampup for learning rate, in number of samples | ||
|
||
# Performance | ||
perf: | ||
fp_optimizations: amp-bf16 | ||
# Floating point mode, one of ["fp32", "fp16", "amp-fp16", "amp-bf16"] | ||
# "amp-{fp16,bf16}" activates Automatic Mixed Precision (AMP) with {float16,bfloat16} | ||
dataloader_workers: 4 | ||
# DataLoader worker processes | ||
songunet_checkpoint_level: 1 # 0 means no checkpointing | ||
# Gradient checkpointing level, value is number of layers to checkpoint | ||
|
||
# I/O | ||
io: | ||
regression_checkpoint_path: /lustre/fsw/portfolios/coreai/projects/coreai_climate_earth2/tge/gefs_regression/checkpoints_lt_aware_ce_regression/UNet.0.15.mdlus | ||
# Where to load the regression checkpoint | ||
print_progress_freq: 1 | ||
# How often to print progress | ||
save_checkpoint_freq: 5 | ||
# How often to save the checkpoints, measured in number of processed samples | ||
validation_freq: 1 | ||
# how often to record the validation loss, measured in number of processed samples | ||
validation_steps: 1000 | ||
# how many loss evaluations are used to compute the validation loss per checkpoint |
Oops, something went wrong.