Correct `AveragedTimeInterval` to use actuations #3721

liuchihl · 2024-08-21T20:12:16Z

This is a cleanup version of the closed PR #3717

hdrake · 2024-08-21T20:24:41Z

@liuchihl, thanks for cleaning up these changes by separating them from the background flux PR—it's much clearer now.

Consolidating @glwagner and @navidcy's earlier comments, it seems there are three things that need to be done before this can be merged:

Review the the existing OutputWriter tests and verify that they still pass with the new implementation in this PR
Create a new, more rigorous, test that is capable of flagging the bizarre behavior you found in your issue but (hopefully) now passes thanks to the changes in this branch.
Add some warnings to let users know that TimeInterval and AveragedTimeInterval (and probably other diagnostic schedules) are currently broken and give incorrect results after picking up from a checkpoint whenever the checkpoint interval is not an integer multiple of the scheduled time interval.

glwagner · 2024-08-22T15:33:29Z

Add some warnings to let users know that TimeInterval and AveragedTimeInterval (and probably other diagnostic schedules) are currently broken and give incorrect results after picking up from a checkpoint whenever the checkpoint interval is not an integer multiple of the scheduled time interval.

Can we make it so this warning gets thrown only if one is using a Checkpointer? I think the majority of simulations do not use a Checkpointer so the warning would be irrelevant in most cases. Maybe we should put the warning within the checkpointer constructor.

Checkpointing certainly needs love. I think it's only used for barebones stuff right now, not complicated simulations. To be fully featured we have to somehow have a system for checkpointing all callbacks. It's not just AveragedTimeInterval that would have a problem.

hdrake · 2024-08-22T15:59:33Z

I think the majority of simulations do not use a Checkpointer so the warning would be irrelevant in most cases.

I don't get this. How are people not using a Checkpointer? Is no one else limited by HPC wall times or running long simulations? It seems like one of the most fundamental capabilities of any time-stepped numerical model.

But yes, this warning should only be issued if both OutputWriter and Checkpointers are being used and if the checkpointer interval is not an integer multiple of the OutputWriter interval.

glwagner · 2024-08-22T16:03:56Z

I think the majority of simulations do not use a Checkpointer so the warning would be irrelevant in most cases.

I don't get this. How are people not using a Checkpointer? Is no one else limited by HPC wall times or running long simulations? It seems like one of the most fundamental capabilities of any time-stepped numerical model.

But yes, this warning should only be issued if a Checkpointer is used (and maybe also when a simulation picks up from an existing Checkpoint).

I think for sophisticated research Checkpointing is common, but for simpler classroom and LES applications the checkpointer is used less. After all, probably the most simulations are actually run in our examples on CI -- and there are no examples with a checkpointer! (It would be nice to change that)

I can't speak for others, but for boundary layer parameterization work the LES typically run in less than 24 hours of wall time. We also only utilize very simple diagnostics, like the horizontally-averaged solution at the final time step. So in those rare cases that we need a checkpointer (I have used a handful of times) barebones checkpointing is sufficient.

Of course we are currently working on building a OMIP simulation and that will require much longer runs, so we will definitely need more sophisticated checkpointing very soon.

@simone-silvestri and @tomchor might have more to add. Or @sandreza, what do you use for the neverworld work?

I'm not saying we don't want to develop this, I'm just providing some context about why this hasn't been resolved / developed yet.

In an ideal world the simulations would run fast enough that we wouldn't need checkpointing, after all 😄

tomchor · 2024-08-22T16:21:39Z

I think for sophisticated research Checkpointing is common, but for simpler classroom and LES applications the checkpointer is used less. After all, probably the most simulations are actually run in our examples on CI -- and there are no examples with a checkpointer! (It would be nice to change that)

Agreed!

I can't speak for others, but for boundary layer parameterization work the LES typically run in less than 24 hours of wall time. We also only utilize very simple diagnostics, like the horizontally-averaged solution at the final time step. So in those rare cases that we need a checkpointer (I have used a handful of times) barebones checkpointing is sufficient.

Of course we are currently working on building a OMIP simulation and that will require much longer runs, so we will definitely need more sophisticated checkpointing very soon.

@simone-silvestri and @tomchor might have more to add. Or @sandreza, what do you use for the neverworld work?

For context, 100% of my simulations have used checkpoints. As far as I know, 100% of the simulations from others in my group also use checkpoints. The only exceptions for my case are very early scripts still in the development phase, and still with very coarse grids. As soon as I try to get more serious with it, I need checkpoints. So this a PR I'm very much looking forward to seeing merged ;)

liuchihl · 2024-09-26T23:26:23Z

@hdrake @glwagner
The original windowed_time_average successfully passes the NetCDF OutputWriter test. However, when running the existing MWE, test_netcdf_time_averaging , setting Δt to 0.01 and an average window of 3Δt produces a similar discontinuity to what was observed in #3670 (comment), as shown in the figure below. The vertical lines indicate the start and end of the windows.

With this new PR, the same test now yields a smooth solution:

While the discontinuity caused by rounding errors has been resolved, not all cases with different Δt pass the test, e.g.,

    for (n, t) in enumerate(single_ds["time"][2:end])
        averaging_times = [t - n*Δt for n in 0:stride:window_size-1 if t - n*Δt >= 0]
        @test all(isapprox.(single_ds["c1"][:, n+1], c̄1(averaging_times), rtol=1e-5))
    end

Here is an example of a case that does not pass the test:

using Oceananigans
using Plots
using NCDatasets
using Test

    arch = CPU()
    topo = (Periodic, Periodic, Periodic)
    domain = (x=(0, 1), y=(0, 1), z=(0, 1))
    grid = RectilinearGrid(arch, topology=topo, size=(4, 4, 4); domain...)

    λ1(x, y, z) = x + (1 - y)^2 + tanh(z)
    λ2(x, y, z) = x + (1 - y)^2 + tanh(4z)

    Fc1(x, y, z, t, c1) = - λ1(x, y, z) * c1
    Fc2(x, y, z, t, c2) = - λ2(x, y, z) * c2
    
    c1_forcing = Forcing(Fc1, field_dependencies=:c1)
    c2_forcing = Forcing(Fc2, field_dependencies=:c2)

    model = NonhydrostaticModel(; grid,
                                timestepper = :RungeKutta3,
                                tracers = (:c1, :c2),
                                forcing = (c1=c1_forcing, c2=c2_forcing))

    set!(model, c1=1, c2=1)

    Δt = .01 #1/64 # Nice floating-point number
    simulation = Simulation(model, Δt=Δt, stop_time=150Δt)

    ∫c1_dxdy = Field(Average(model.tracers.c1, dims=(1, 2)))
    ∫c2_dxdy = Field(Average(model.tracers.c2, dims=(1, 2)))
        
    nc_outputs = Dict("c1" => ∫c1_dxdy, "c2" => ∫c2_dxdy)
    nc_dimensions = Dict("c1" => ("zC",), "c2" => ("zC",))

    single_time_average_nc_filepath = "single_decay_windowed_time_average_test.nc"
    window_nΔt = 3
    window = window_nΔt*Δt
    interval_nΔt = 5
    interval = interval_nΔt*Δt
    stride = 1

    single_nc_output = Dict("c1" => ∫c1_dxdy)
    single_nc_dimension = Dict("c1" => ("zC",))

    simulation.output_writers[:single_output_time_average] =
        NetCDFOutputWriter(model, single_nc_output,
                           array_type = Array{Float64},
                           verbose = true,
                           filename = single_time_average_nc_filepath,
                           schedule = AveragedTimeInterval(interval, window = window, stride = stride),
                           dimensions = single_nc_dimension,
                           overwrite_existing = true)
    run!(simulation)

    ##### For each λ, horizontal average should evaluate to
    #####
    #####     c̄(z, t) = ∫₀¹ ∫₀¹ exp{- λ(x, y, z) * t} dx dy
    #####             = 1 / (Nx*Ny) * Σᵢ₌₁ᴺˣ Σⱼ₌₁ᴺʸ exp{- λ(i, j, k) * t}
    #####
    ##### which we can compute analytically.

    # ds = NCDataset(horizontal_average_nc_filepath)

    Nx, Ny, Nz = size(grid)
    xs, ys, zs = nodes(model.tracers.c1)

    c̄1(z, t) = 1 / (Nx * Ny) * sum(exp(-λ1(x, y, z) * t) for x in xs for y in ys)
    c̄2(z, t) = 1 / (Nx * Ny) * sum(exp(-λ2(x, y, z) * t) for x in xs for y in ys)

    rtol = 1e-5 # need custom rtol for isapprox because roundoff errors accumulate (?)

    # Compute time averages...
    c̄1(ts) = 1/length(ts) * sum(c̄1.(zs, t) for t in ts)
    c̄2(ts) = 1/length(ts) * sum(c̄2.(zs, t) for t in ts)

    #####
    ##### Test strided windowed time average against analytic solution
    ##### for *single* NetCDF output
    #####
    
    single_ds = NCDataset(single_time_average_nc_filepath)

    attribute_names = ("schedule", "interval", "output time interval",
                       "time_averaging_window", "time averaging window",
                       "time_averaging_stride", "time averaging stride")

    for name in attribute_names
        @test haskey(single_ds.attrib, name) && !isnothing(single_ds.attrib[name])
    end

    window_size = Int(window/Δt)

    @info "    Testing time-averaging of a single NetCDF output [$(typeof(arch))]..."

    for (n, t) in enumerate(single_ds["time"][2:end])
        averaging_times = [t - n*Δt for n in 0:stride:window_size-1 if t - n*Δt >= 0]
        # @info n,t,averaging_times, c̄1(averaging_times).-single_ds["c1"][:, n+1]
        @test all(isapprox.(single_ds["c1"][:, n+1], c̄1(averaging_times), rtol=rtol))
    end

I believe there might be some minor issues in our new PR that still need to be addressed.

hdrake · 2024-09-26T23:35:55Z

@liuchihl, did you find any cases where the windowed_time_average in this PR fails the test with window_nΔt == interval_nΔt? Maybe we have an indexing error when we are waiting for the window to start and we actually average over slightly the wrong period? It also would be good to confirm that the problem is with windowed_time_average, and not with the analytical solution that we're comparing to!

liuchihl · 2024-09-26T23:51:16Z

did you find any cases where the windowed_time_average in this PR fails the test with window_nΔt == interval_nΔt?

No, the test passes if window_nΔt equals interval_nΔt. However, if they differ, even if by a multiple integer, the test fails.

hdrake · 2024-09-26T23:52:48Z

Well that's promising at least!

liuchihl · 2024-09-26T23:58:06Z

Ah, one caveat, this only holds when the timestep is sufficiently small.

glwagner · 2024-09-27T04:06:19Z

That's great work! Can you summarize what you did? I'm wondering if it makes sense that this is hard or if we should actually consider a more fundamental redesign to make it more robust...

liuchihl · 2024-09-27T13:27:04Z

Sure, the PR resolves the rounding issue caused by previous_interval_stop_time through the use of actuations (as inspired by its application here).
Here is the part of the code showing these changes.

Another important change is that

            # Save averaging start time and the initial data collection time
            wta.window_start_time = model.clock.time
            wta.window_start_iteration = model.clock.iteration
            wta.previous_collection_time = model.clock.time

            wta.schedule.collecting = false
            wta.schedule.actuations += 1

occurs only when the window ends, i.e., when end_of_window(wta.schedule, model.clock) == true. In contrast, the previous version triggered this only when the model wasn't collecting.

I'm wondering if it makes sense that this is hard or if we should actually consider a more fundamental redesign to make it more robust...

I agree that a more fundamental redesign could improve robustness in the long term. That said, the current adjustments seem to resolve the issue for now (I'll look into why certain cases aren't passing the test). We can continue to monitor its performance and consider a redesign if further issues arise.

hdrake · 2024-09-27T13:33:55Z

@liuchihl, can you explain a bit more your caveat about the new method only passing the test if the timestep is small enough? With the new method, the windowed-time-averages still look good by eye but quantitatively the errors are larger than the default relative tolerance of 1e-5, right? That seems it would still be an improvement over spurious zeros that sometimes show up with the previous method?

liuchihl · 2024-09-27T13:57:48Z

Yes, I realize that increasing Δt leads to larger errors (especially for the first two windows but numerical and analytical solutions would match in subsequent windows), which would surpass the default relative tolerance of 1e-5.

The spurious zeros are definitely incorrect, so solving that is an improvement.

glwagner · 2024-09-27T17:29:29Z

For sure, we are happy to merge any improvement no matter how small. PRs only have to push the project forward, they don't have to be "big" or "important".

Independent of that, it'd be good to have a clear conclusion about whether a redesign really is needed too because we are in a good position to make that judgment now.

glwagner · 2024-09-27T17:30:35Z

Should we unmark this as a draft and try to get it merged? If there is still a problem and you know how to test it, you can add a @test_broken to the test suite. Then after we merge this PR, you can open another PR which marks that as a @test and takes up where this one leaves off.

hdrake · 2024-09-27T17:33:18Z

@glwagner, give us a couple of more days to figure sort this out. I'm not so sure that we need a fundamental redesign, but I think that should be revisited when we decide to take on the more challenging issue of checkpointing partially-accumulated time-averages so that windowed_time_average always works as intended when picking up from checkpoints.

glwagner · 2024-09-27T19:04:43Z

I think that should be revisited when we decide to take on the more challenging issue of checkpointing partially-accumulated time-averages so that windowed_time_average always works as intended when picking up from checkpoints.

good point!

There is a possibility that it is not very hard. It will require reshuffling code (which I can do), but with Julia we can serialize objects to disk and then load them back seamlessly in a single line... which might be all we need. The only limitation of serialization is that we haven't yet figured out how to serialize methods (eg functions) which prevents us from serializing entire models. When functions are not involved things can be pretty simple.

…iter.jl

liuchihl · 2024-09-28T14:08:34Z

Our earlier tests with a simple sine function indicate that when the checkpoint interval is an integer multiple of the AveragedTimeInterval, the results after the checkpoint seem reasonable. However, I’ve noticed this isn't the case with the following parameter settings, for instance:

    Δt = .01             # timestep
    T1 = 6Δt             # first simulation stop time
    T2 = 2T1             # second simulation stop time
    window_nΔt = 2       # window interval: 2Δt
    interval_nΔt = 2     # time average saving interval: 2Δt
    stride = 1

The averaged values are clearly off after the checkpoint (t>6Δt):

This issue does not only occur in the existing MWE (decaying function); it also occurs in our MWE using the exact same parameters mentioned above. The dashed curve and steps indicate TimeInterval and AveragedTimeInterval outputs, respectively, shown in the figure below. It is unclear to me as to why spurious zero appears in this case.

The point of these tests is to show that even when the checkpoint interval is an integer multiple of the AveragedTimeInterval, issues can still arise.

Here is the MWE with the decaying function for reference:

using Oceananigans
using Plots
using NCDatasets
using Test
if isfile("single_decay_windowed_time_average_test.nc")
    rm("single_decay_windowed_time_average_test.nc")
end
run(`sh -c "rm test_iteration*.jld2"`)


function test_simulation(stop_time, Δt, window_nΔt, interval_nΔt, stride, overwrite)

    arch = CPU()
    topo = (Periodic, Periodic, Periodic)
    domain = (x=(0, 1), y=(0, 1), z=(0, 1))
    grid = RectilinearGrid(arch, topology=topo, size=(4, 4, 4); domain...)

    λ1(x, y, z) = x + (1 - y)^2 + tanh(z)
    λ2(x, y, z) = x + (1 - y)^2 + tanh(4z)

    Fc1(x, y, z, t, c1) = - λ1(x, y, z) * c1
    Fc2(x, y, z, t, c2) = - λ2(x, y, z) * c2
    
    c1_forcing = Forcing(Fc1, field_dependencies=:c1)
    c2_forcing = Forcing(Fc2, field_dependencies=:c2)

    model = NonhydrostaticModel(; grid,
                                timestepper = :RungeKutta3,
                                tracers = (:c1, :c2),
                                forcing = (c1=c1_forcing, c2=c2_forcing))

    set!(model, c1=1, c2=1)
    simulation = Simulation(model, Δt=Δt, stop_time=stop_time)

    ∫c1_dxdy = Field(Average(model.tracers.c1, dims=(1, 2)))
    ∫c2_dxdy = Field(Average(model.tracers.c2, dims=(1, 2)))
        
    nc_outputs = Dict("c1" => ∫c1_dxdy, "c2" => ∫c2_dxdy)
    nc_dimensions = Dict("c1" => ("zC",), "c2" => ("zC",))

    single_time_average_nc_filepath = "single_decay_windowed_time_average_test.nc"
    
    window = window_nΔt*Δt
    interval = interval_nΔt*Δt

    single_nc_output = Dict("c1" => ∫c1_dxdy)
    single_nc_dimension = Dict("c1" => ("zC",))

    simulation.output_writers[:single_output_time_average] =
        NetCDFOutputWriter(model, single_nc_output,
                           array_type = Array{Float64},
                           verbose = true,
                           filename = single_time_average_nc_filepath,
                           schedule = AveragedTimeInterval(interval, window = window, stride = stride),
                           dimensions = single_nc_dimension,
                           overwrite_existing = overwrite)
    checkpointer = Checkpointer(model,
                                schedule = TimeInterval(stop_time),
                                prefix = "test",
                                cleanup = true)

    simulation.output_writers[:checkpointer] = checkpointer


    return simulation

end
    
    Δt = .01 #1/64 # Nice floating-point number
    T1 = 6Δt      # first simulation stop time (s)
    T2 = 2T1    # second simulation stop time (s)
    window_nΔt = 2
    interval_nΔt = 2
    stride = 1
    # Run a simulation that saves data to a checkpoint
    simulation = test_simulation(T1, Δt, window_nΔt, interval_nΔt, stride, true)
    run!(simulation)

    # Now try again, but picking up from the previous checkpoint
    N = iteration(simulation)
    checkpoint = "test_iteration$N.jld2"
    simulation = test_simulation(T2, Δt, window_nΔt, interval_nΔt, stride, false)
    run!(simulation, pickup=checkpoint)

    ##### For each λ, horizontal average should evaluate to
    #####
    #####     c̄(z, t) = ∫₀¹ ∫₀¹ exp{- λ(x, y, z) * t} dx dy
    #####             = 1 / (Nx*Ny) * Σᵢ₌₁ᴺˣ Σⱼ₌₁ᴺʸ exp{- λ(i, j, k) * t}
    #####
    ##### which we can compute analytically.

    arch = CPU()
    topo = (Periodic, Periodic, Periodic)
    domain = (x=(0, 1), y=(0, 1), z=(0, 1))
    grid = RectilinearGrid(arch, topology=topo, size=(4, 4, 4); domain...)

    λ1(x, y, z) = x + (1 - y)^2 + tanh(z)
    λ2(x, y, z) = x + (1 - y)^2 + tanh(4z)

    Fc1(x, y, z, t, c1) = - λ1(x, y, z) * c1
    Fc2(x, y, z, t, c2) = - λ2(x, y, z) * c2
    
    c1_forcing = Forcing(Fc1, field_dependencies=:c1)
    c2_forcing = Forcing(Fc2, field_dependencies=:c2)

    model = NonhydrostaticModel(; grid,
                                timestepper = :RungeKutta3,
                                tracers = (:c1, :c2),
                                forcing = (c1=c1_forcing, c2=c2_forcing))


    Nx, Ny, Nz = size(grid)
    xs, ys, zs = nodes(model.tracers.c1)

    c̄1(z, t) = 1 / (Nx * Ny) * sum(exp(-λ1(x, y, z) * t) for x in xs for y in ys)
    c̄2(z, t) = 1 / (Nx * Ny) * sum(exp(-λ2(x, y, z) * t) for x in xs for y in ys)

    rtol = 1e-5 # need custom rtol for isapprox because roundoff errors accumulate (?)

    # Compute time averages...
    c̄1(ts) = 1/length(ts) * sum(c̄1.(zs, t) for t in ts)
    c̄2(ts) = 1/length(ts) * sum(c̄2.(zs, t) for t in ts)

    #####
    ##### Test strided windowed time average against analytic solution
    ##### for *single* NetCDF output
    #####
    single_time_average_nc_filepath = "single_decay_windowed_time_average_test.nc"
    single_ds = NCDataset(single_time_average_nc_filepath)

    attribute_names = ("schedule", "interval", "output time interval",
                       "time_averaging_window", "time averaging window",
                       "time_averaging_stride", "time averaging stride")

    for name in attribute_names
        @test haskey(single_ds.attrib, name) && !isnothing(single_ds.attrib[name])
    end

    window_size = window_nΔt
    window = window_size*Δt

    time = single_ds["time"][:]
    data_plot = single_ds["c1"][1:4, :]
    c̄1_timeaverage = zeros(4,length(time[1:end]))
    for (n, t) in enumerate(time[1:end])
        averaging_times = [t - n*Δt for n in 0:stride:window_size-1 if t - n*Δt >= 0]
        # @info n,t,averaging_times, c̄1(averaging_times)
        c̄1_timeaverage[:,n] = c̄1(averaging_times)
        # @test all(isapprox.(single_ds["c1"][:, n+1], c̄1(averaging_times), rtol=rtol))
    end

    # Plot each of the four lines
    pl = plot()
    plot!(time, data_plot[1, :], label="1", color=:blue, legend=:topright)
    plot!(time, data_plot[2, :], label="2", color=:red)
    plot!(time, data_plot[3, :], label="3", color=:orange)
    plot!(time, data_plot[4, :], label="4", color=:green)

    plot!(time[1:end],c̄1_timeaverage[1,:], color=:black, linestyle=:dash, label="1-analytic")
    plot!(time[1:end],c̄1_timeaverage[2,:], color=:black, linestyle=:dash, label="2-analytic")
    plot!(time[1:end],c̄1_timeaverage[3,:], color=:black, linestyle=:dash, label="3-analytic")
    plot!(time[1:end],c̄1_timeaverage[4,:], color=:black, linestyle=:dash, label="4-analytic")
   
    tt = 0:window:T2
    for i in 1:length(tt)
    plot!([tt[i], tt[i]],[0,1],color=:grey,label="")
    end
    title!(pl, string("Δt=",Δt,", average window=",window_nΔt,"Δt")) # Add the title to the plot
    ylims!(pl,(minimum(c̄1_timeaverage[4,:]),maximum(c̄1_timeaverage[4,:])))
    xlims!(pl,(0,T2))
    close(single_ds)
    display(pl)

liuchihl · 2024-10-19T00:59:29Z

@glwagner @hdrake
With the latest commit, I believe we've addressed the remaining issues in this PR, including those that occur when 1) the saving interval differs from the window time average interval, and 2) after resuming from a checkpoint.

A quick result showing the numerical solution matches well with the analytical solution even when window ≠ interval and picking up checkpoint at t=1:

To properly handle the checkpoint pickup, we manually adjusted the actuation to match the correct value based on the pre-pickup simulation. Here's an example to illustrate what I mean.
While this resolves the issue of checkpoint for now, it's more of a workaround, and a more robust high-level design is still needed. I believe some help is required with the checkpoint pickup design, but aside from that, the window time averaging appears to be functioning correctly now!

tomchor · 2024-10-23T18:23:25Z

@liuchihl that's awesome. Is there anything relating to AveragedTimeInterval that doesn't work after this PR?

Also, based on what you said I'm marking this PR as ready for review. (i.e. no longer a draft)

Change timestep in time-averaging-interval test to flag rounding errors

hdrake · 2024-11-14T17:22:46Z

To summarize the discussion above and in related #3670: our MWE that we used to flag and debug these rounding point errors in the WindowedTimeAverage scheme was actually the exact same as the test_netcdf_spatial_average test function in test/test_netcdf_output_writer.jl, just with the value of the timestep changed from

Δt = 1/64 # Nice floating-point number

to

Δt = 0.01 # Floating point number chosen conservatively to flag rounding errors

The original choice of a "nice" floating-point number is convenient because it is not as susceptible to rounding errors, but not as useful for a test because it is overly optimistic. Instead, the test should use a pessimistic value that is more prone to rounding errors, as this protects against the edge cases that users might run into–as we did in our production runs!

Here is the MWE that @liuchihl has modified from the existing one, but which I have removed in favor of a very small modification to the existing MWE test.
a0c725c#diff-60f3eb03acc634682fc82a022d0a0b32382d1e890be3057750545a3d0ada1c09L99

hdrake · 2024-11-14T17:24:01Z

@tomchor and @glwagner, I think this is ready to be merged!

We've tested this quite a bit in various MWEs and also in our complicated production runs with tidally-averaged outputs, and find the results to be much improved over the previous method.

Feedback and testing from others welcome!

glwagner

Nice work!

glwagner · 2024-11-14T17:42:19Z

@tomchor and @glwagner, I think this is ready to be merged!

We've tested this quite a bit in various MWEs and also in our complicated production runs with tidally-averaged outputs, and find the results to be much improved over the previous method.

Feedback and testing from others welcome!

It's definitely reassuring that you've used it a bit. My only comment is: do you feel confident that the tests are adequate to catch any future bugs that might creep in?

Also, the only test that was changed regards NetCDF. Should we also change or add a test for time-averaging with JLD2? This is in your interest; you want to ensure that people using JLD2 are having a similar experience as with NetCDF so that you can collaborate on improvements and bugfixes.

hdrake · 2024-11-14T17:45:23Z

Also, the only test that was changed regards NetCDF. Should we also change or add a test for time-averaging with JLD2? This is in your interest; you want to ensure that people using JLD2 are having a similar experience as with NetCDF so that you can collaborate on improvements and bugfixes.

That's a great point. Is there any reason these two test_*_output_writer.jl scripts are not currently exactly mirrored, i.e. running the same test suite?

glwagner · 2024-11-14T17:58:56Z

Also, the only test that was changed regards NetCDF. Should we also change or add a test for time-averaging with JLD2? This is in your interest; you want to ensure that people using JLD2 are having a similar experience as with NetCDF so that you can collaborate on improvements and bugfixes.

That's a great point. Is there any reason these two test_*_output_writer.jl scripts are not currently exactly mirrored, i.e. running the same test suite?

I don't think there is any feature missing between the two (except for FieldTimeSeries, but that is not strictly an output issue), so no. It should be easy to copy/paste a test and change NetCDF to JLD2 (hopefully only with other minor changes...)

If I had to guess a reason, the interfaces have fluctuated somewhat so they were not always identical. #3793 proposes adding a wrapper which would further unify the interfaces. Furthermore, once NetCDF supports FieldTimeSeries, we should be able to write tests which are completely agnostic to the output format and then we can really simplify the testing.

hdrake · 2024-11-14T18:26:26Z

Understood, @glwagner. @liuchihl and I will sort out the tests related to this PR later today.

tomchor · 2024-11-16T12:30:32Z

For the record, I tested this branch on my simulations (which are pretty complex and make extensive use of time averaing) and everything worked well 👍

Since the reformulation of `AveragedTimeInterval` in CliMA#3721 schedules the OutputWriter based on model clock time rather than iteration numbers, the `stride` parameter (which is inherently based on iteration number) is somewhat ambiguous. For instance, since the model sometimes takes extremely short timesteps due to floating point rounding errors, a "stride" of two can mean either 2 times the nominal timestep or just 1 times the nominal timestep. This decoupling of iteration number and model clock time, which would presumably be worse when using a timestep wizard that produces highly variable timesteps, can introduce significant errors into `AveragedTimeInterval` results when stride>1. I have added a warning that users should be cautious about using `AveragedTimeInterval` with `stride` set to anything but the default value of 1. The NetCDFOutputWriter test of `AveragedTimeInterval` has now also been ported over as a test of the JLD2OutputWriter. The parameter setting of `stride=2` has been changed to `stride=1`, motivated by the discussion above.

Add warning about stride>1 and add test for JLD2 output writer

hdrake · 2024-11-16T23:49:38Z

@glwagner, I just added the requested JLD2 test and slightly modified the test so that it always passes.

Even with this new reformulation, setting stride to a value greater than 1 is problematic because it is based on iteration number, whereas the whole point of this PR is to move away from saving based on iteration numbers because they give erroneous results whenever floating point rounding errors pop, e.g. because we occasionally use extremely short timesteps to resync the model timesteps with output writer schedules.

I've just added a warning that lets users know to treat AveragedTimeInterval results with caution when stride≠1. The important thing is that the new methodology always works when stride=1, which is not true of the iteration-based scheme currently in main.

Hopefully this is good to go now!

glwagner · 2024-11-17T03:16:22Z