Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify checkpointer and make it work for large models #628

Merged
merged 17 commits into from
Feb 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ makedocs(
"model_setup/turbulent_diffusivity_closures_and_les_models.md",
"Diagnostics" => "model_setup/diagnostics.md",
"Output writers" => "model_setup/output_writers.md",
"Checkpointing" => "model_setup/checkpointing.md",
"Time stepping" => "model_setup/time_stepping.md",
"Setting initial conditions" =>
"model_setup/setting_initial_conditions.md"
Expand Down
7 changes: 7 additions & 0 deletions docs/src/library.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,13 @@ Pages = [
]
```

## Simulations
```@autodocs
Modules = [Oceananigans.Simulations]
Private = false
Pages = ["Simulations.jl"]
```

## Tubrulence closures
```@autodocs
Modules = [Oceananigans.TurbulenceClosures]
Expand Down
31 changes: 31 additions & 0 deletions docs/src/model_setup/checkpointing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Checkpointing
A checkpointer can be used to serialize the entire model state to a file from which the model can be restored at any
time. This is useful if you'd like to periodically checkpoint when running long simulations in case of crashes or
cluster time limits, but also if you'd like to restore from a checkpoint and try out multiple scenarios.

For example, to periodically checkpoint the model state to disk every 1,000,000 seconds of simulation time to files of
the form `model_checkpoint_iteration12500.jld2` where `12500` is the iteration number (automatically filled in)
```@example
model = Model(grid=RegularCartesianGrid(size=(16, 16, 16), length=(1, 1, 1)))
model.output_writers[:checkpointer] = Checkpointer(model; interval=1e6, prefix="model_checkpoint")
```

The default options should provide checkpoint files that are easy to restore from in most cases. For more advanced
options and features, see [`Checkpointer`](@ref).

## Restoring from a checkpoint file
To restore the model from a checkpoint file, for example `model_checkpoint_12345.jld2`, simply call
```
model = restore_from_checkpoint("model_checkpoint_12345.jld2")
```
which will create a new model object that is identical to the one that was serialized to disk. You can continue time
stepping after restoring from a checkpoint.

You can pass additional parameters to the `Model` constructor. See [`restore_from_checkpoint`](@ref) for more
information.

## Restoring with functions
JLD2 cannot serialize functions to disk. so if you used forcing functions, boundary conditions containing functions, or
the model included references to functions then they will not be serialized to the checkpoint file. When restoring from
a checkpoint file, any model property that contained functions must be manually restored via keyword arguments to
[`restore_from_checkpoint`](@ref).
17 changes: 10 additions & 7 deletions docs/src/model_setup/diagnostics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ interest you may want to save to disk, such as the horizontal average of the tem
produce a time series of salinity. They also include utilities for diagnosing model health, such as the CFL number or
to check for NaNs.

Diagnostics are stored as a list of diagnostics in `model.diagnostics`. Diagnostics can be specified at model creation
time or be specified at any later time and appended (or assigned with a key value pair) to `model.diagnostics`.
Diagnostics are stored as a list of diagnostics in `simulation.diagnostics`. Diagnostics can be specified at model creation
time or be specified at any later time and appended (or assigned with a key value pair) to `simulation.diagnostics`.

Most diagnostics can be run at specified frequencies (e.g. every 25 time steps) or specified intervals (e.g. every
15 minutes of simulation time). If you'd like to run a diagnostic on demand then do not specify a frequency or interval
(and do not add it to `model.diagnostics`).
(and do not add it to `simulation.diagnostics`).

We describe the `HorizontalAverage` diagnostic in detail below but see the API documentation for other diagnostics such
as [`TimeSeries`](@ref), [`FieldMaximum`](@ref), [`CFL`](@ref), and [`NaNChecker`](@ref).
Expand All @@ -18,26 +18,29 @@ as [`TimeSeries`](@ref), [`FieldMaximum`](@ref), [`CFL`](@ref), and [`NaNChecker
You can create a `HorizontalAverage` diagnostic by passing a field to the constructor, e.g.
```@example
model = Model(grid=RegularCartesianGrid(size=(16, 16, 16), length=(1, 1, 1)))
simulation = Simulation(model, Δt=6, stop_iteration=10)
T_avg = HorizontalAverage(model.tracers.T)
push!(model.diagnostics, T_avg)
push!(simulation.diagnostics, T_avg)
```
which can then be called on-demand via `T_avg(model)` to return the horizontally averaged temperature. When running on
the GPU you may want it to return an `Array` instead of a `CuArray` in case you want to save the horizontal average to
disk in which case you'd want to construct it like
```@example
model = Model(grid=RegularCartesianGrid(size=(16, 16, 16), length=(1, 1, 1)))
T_avg = HorizontalAverage(model.tracers.T; return_type=Array)
push!(model.diagnostics, T_avg)
simulation = Simulation(model, Δt=6, stop_iteration=10)
T_avg = HorizontalAverage(model.tracers.T, return_type=Array)
push!(simulation.diagnostics, T_avg)
```

You can also use pass an abstract operator to take the horizontal average of any diagnosed quantity. For example, to
compute the horizontal average of the vertical component of vorticity:
```@example
model = Model(grid=RegularCartesianGrid(size=(16, 16, 16), length=(1, 1, 1)))
simulation = Simulation(model, Δt=6, stop_iteration=10)
u, v, w = model.velocities
ζ = ∂x(v) - ∂y(u)
ζ_avg = HorizontalAverage(ζ)
model.diagnostics[:vorticity_profile] = ζ_avg
simulation.diagnostics[:vorticity_profile] = ζ_avg
```

See [`HorizontalAverage`](@ref) for more details and options.
56 changes: 14 additions & 42 deletions docs/src/model_setup/output_writers.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ Saving model data to disk can be done in a flexible manner using output writers.
implemented are a NetCDF output writer (relying on [NCDatasets.jl](https://github.com/Alexander-Barth/NCDatasets.jl))
and a JLD2 output writer (relying on [JLD2.jl](https://github.com/JuliaIO/JLD2.jl)).

Output writers are stored as a list of output writers in `model.output_writers`. Output writers can be specified at
model creation time or be specified at any later time and appended (or assigned with a key value pair) to
`model.output_writers`.
Output writers are stored as a list of output writers in `simulation.output_writers`. Output writers can be specified
at model creation time or be specified at any later time and appended (or assigned with a key value pair) to
`simulation.output_writers`.

## NetCDF output writer
Model data can be saved to NetCDF files along with associated metadata. The NetCDF output writer is generally used by
Expand All @@ -16,6 +16,7 @@ slices) along with output attributes
```@example
Nx = Ny = Nz = 16
model = Model(grid=RegularCartesianGrid(size=(Nx, Ny, Nz), length=(1, 1, 1)))
simulation = Simulation(model, Δt=12, stop_time=3600)

fields = Dict(
"u" => model.velocities.u,
Expand All @@ -27,12 +28,14 @@ output_attributes = Dict(
"T" => Dict("longname" => "Temperature", "units" => "C")
)

model.output_writers[:field_writer] = NetCDFOutputWriter(model, fields; filename="output_fields.nc",
interval=6hour, output_attributes=output_attributes)
simulation.output_writers[:field_writer] =
NetCDFOutputWriter(model, fields; filename="output_fields.nc",
interval=6hour, output_attributes=output_attributes)

model.output_writers[:surface_slice_writer] = NetCDFOutputWriter(model, fields; filename="output_surface_xy_slice.nc",
interval=5minute, output_attributes=output_attributes,
zC=Nz, zF=Nz)
simulation.output_writers[:surface_slice_writer] =
NetCDFOutputWriter(model, fields; filename="output_surface_xy_slice.nc",
interval=5minute, output_attributes=output_attributes,
zC=Nz, zF=Nz)
```

See [`NetCDFOutputWriter`](@ref) for more details and options.
Expand All @@ -47,6 +50,7 @@ of the function will be saved to the JLD2 file. For example, to write out 3D fie
of T every 1 hour of simulation time to a file called `some_data.jld2`
```@example
model = Model(grid=RegularCartesianGrid(size=(16, 16, 16), length=(1, 1, 1)))
simulation = Simulation(model, Δt=12, stop_time=3600)

function init_save_some_metadata(file, model)
file["author"] = "Chim Riggles"
Expand All @@ -62,41 +66,9 @@ outputs = Dict(
:T_avg => model -> T_avg(model)
)

jld2_writer = JLD2OutputWriter(model, outputs; init=init_save_some_metadata, interval=1hour, prefix="some_data")
jld2_writer = JLD2OutputWriter(model, outputs, init=init_save_some_metadata, interval=1hour, prefix="some_data")

push!(model.output_writers, jld2_writer)
push!(simulation.output_writers, jld2_writer)
```

See [`JLD2OutputWriter`](@ref) for more details and options.

## Checkpointer
A checkpointer can be used to serialize the entire model state to a file from which the model can be restored at any
time. This is useful if you'd like to periodically checkpoint when running long simulations in case of crashes or
cluster time limits, but also if you'd like to restore from a checkpoint and try out multiple scenarios.

For example, to periodically checkpoint the model state to disk every 1,000,000 seconds of simulation time to files of
the form `model_checkpoint_xxx.jld2` where `xxx` is the iteration number (automatically filled in)
```@example
model = Model(grid=RegularCartesianGrid(size=(16, 16, 16), length=(1, 1, 1)))
model.output_writers[:checkpointer] = Checkpointer(model; interval=1e6, prefix="model_checkpoint")
```

The default options should provide checkpoint files that are easy to restore from in most cases. For more advanced
options and features, see [`Checkpointer`](@ref).

### Restoring from a checkpoint file
To restore the model from a checkpoint file, for example `model_checkpoint_12345.jld2`, simply call
```
model = restore_from_checkpoint("model_checkpoint_12345.jld2")
```
which will create a new model object that is identical to the one that was serialized to disk. You can continue time
stepping after restoring from a checkpoint.

You can pass additional parameters to the `Model` constructor. See [`restore_from_checkpoint`](@ref) for more
information.

### Restoring with functions
JLD2 cannot serialize functions to disk. so if you used forcing functions, boundary conditions containing functions, or
the model included references to functions then they will not be serialized to the checkpoint file. When restoring from
a checkpoint file, any model property that contained functions must be manually restored via keyword arguments to
[`restore_from_checkpoint`](@ref).
2 changes: 1 addition & 1 deletion src/Fields/Fields.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ export
interior, interiorparent,
xnode, ynode, znode, location,
set!,
VelocityFields, TracerFields, tracernames, PressureFields, Tendencies
VelocityFields, TracerFields, tracernames, PressureFields, TendencyFields

include("field.jl")
include("set!.jl")
Expand Down
4 changes: 2 additions & 2 deletions src/Fields/field_tuples.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,13 @@ function PressureFields(arch, grid; pHY′=zeros(arch, grid), pNHS=zeros(arch, g
end

"""
Tendencies(arch, grid, tracer_names; kwargs...)
TendencyFields(arch, grid, tracer_names; kwargs...)

Return a NamedTuple with tendencies for all solution fields (velocity fields and
tracer fields), initialized on the architecture `arch` and `grid`. Optional `kwargs`
can be specified to assign data arrays to each tendency field.
"""
function Tendencies(arch, grid, tracer_names; kwargs...)
function TendencyFields(arch, grid, tracer_names; kwargs...)
velocities = (
u = :u ∈ keys(kwargs) ? XFaceField(arch, grid, kwargs[:u]) : XFaceField(arch, grid),
v = :v ∈ keys(kwargs) ? YFaceField(arch, grid, kwargs[:v]) : YFaceField(arch, grid),
Expand Down
13 changes: 6 additions & 7 deletions src/Models/incompressible_model.jl
Original file line number Diff line number Diff line change
Expand Up @@ -77,19 +77,18 @@ function IncompressibleModel(;
boundary_conditions = SolutionBoundaryConditions(grid),
parameters = nothing,
velocities = VelocityFields(architecture, grid),
tracer_fields = TracerFields(architecture, grid, tracers),
pressures = PressureFields(architecture, grid),
diffusivities = TurbulentDiffusivities(architecture, grid, tracernames(tracers), closure),
timestepper = :AdamsBashforth,
timestepper_method = :AdamsBashforth,
timestepper = TimeStepper(timestepper_method, float_type, architecture, grid, tracernames(tracers)),
pressure_solver = PressureSolver(architecture, grid, PressureBoundaryConditions(boundary_conditions))
)

if architecture == GPU()
!has_cuda() && throw(ArgumentError("Cannot create a GPU model. No CUDA-enabled GPU was detected!"))
if architecture == GPU() && !has_cuda()
throw(ArgumentError("Cannot create a GPU model. No CUDA-enabled GPU was detected!"))
end

timestepper = TimeStepper(timestepper, float_type, architecture, grid, tracernames(tracers))

tracers = TracerFields(architecture, grid, tracers)
validate_buoyancy(buoyancy, tracernames(tracers))

# Regularize forcing, boundary conditions, and closure for given tracer fields
Expand All @@ -98,6 +97,6 @@ function IncompressibleModel(;
boundary_conditions = ModelBoundaryConditions(tracernames(tracers), diffusivities, boundary_conditions)

return IncompressibleModel(architecture, grid, clock, buoyancy, coriolis, surface_waves,
velocities, tracers, pressures, forcing, closure, boundary_conditions,
velocities, tracer_fields, pressures, forcing, closure, boundary_conditions,
timestepper, pressure_solver, diffusivities, parameters)
end
8 changes: 4 additions & 4 deletions src/Oceananigans.jl
Original file line number Diff line number Diff line change
Expand Up @@ -101,15 +101,15 @@ abstract type AbstractPoissonSolver end
"""
AbstractDiagnostic

Abstract supertype for types that compute diagnostic information from the current model
state.
Abstract supertype for diagnostics that compute information from the current
model state.
"""
abstract type AbstractDiagnostic end

"""
AbstractOutputWriter

Abstract supertype for types that perform input and output.
Abstract supertype for output writers that write data to disk.
"""
abstract type AbstractOutputWriter end

Expand Down Expand Up @@ -153,9 +153,9 @@ include("BoundaryConditions/BoundaryConditions.jl")
include("Solvers/Solvers.jl")
include("Forcing/Forcing.jl")
include("Models/Models.jl")
include("TimeSteppers/TimeSteppers.jl")
include("Diagnostics/Diagnostics.jl")
include("OutputWriters/OutputWriters.jl")
include("TimeSteppers/TimeSteppers.jl")
include("Simulations.jl")
include("AbstractOperations/AbstractOperations.jl")

Expand Down
2 changes: 2 additions & 0 deletions src/OutputWriters/OutputWriters.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ export

using Oceananigans
using Oceananigans.Grids
using Oceananigans.Fields
using Oceananigans.Architectures

using Oceananigans: AbstractOutputWriter, @hascuda
using Oceananigans.Fields: OffsetArray

@hascuda using CUDAnative, CuArrays

Expand Down
Loading