-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Prototype water state breakout #395
Conversation
Developed 2018-05-22
Martyn Clark suggested dividing fields into different types depending on whether they are state or diagnostic variables. Mariana Vertenstein suggested having a higher-level type that pulls together the various instances. This initial version attempts to do both of those.
@mathewvrothstein @martynpclark @mvertens I'm recording some evolving thoughts here. There are a couple more iterations (at least) that I still plan to commit. |
New types: water_bulk_only_type and water_bulk_and_tracer_type. (I don't love the names.) The advantages over the last commit are: 1. This groups together instances relating to the same tracer - e.g., the state and diagnostic variables for a given tracer (and later the fluxes for that tracer, too). 2. This potentially will simplify subroutine argument lists, by allowing us to pass a single container instance, e.g., for the water bulk variables.
Inheritance adds some complexity, so I was initially reluctant to go this route. However, I see a few places where this adds some value: 1. It reduces the number of arguments to some calls, because water_bulk_inst and water_bulk_only_inst are packaged together now into a single instance. Also, this packaging is now more apparent. 2. These communicate intention better in subroutine argument lists: subroutines can expect an argument of bulk type, rather than the generic type, ensuring that you're not passing a tracer instance into a routine that expects bulk. 3. This should make it easier to migrate variables between bulk-only and bulk-and-tracer.
Here are some diagrams of the three designs I have prototyped (in the three commits currently on this branch) as well as two other designs that I have not yet prototyped: WaterType possible designs.pdf Below, I'll refer to the different designs as (D1), (D2), etc., and the enumerated list of benefits as (1), (2), etc. The first three designs (D1-D3) correspond to the 3 commits on this branch. They start fairly simple and then get a bit more complex via extra layers – though the extra complexity could result in some extra simplicity / clarity in the science code: Benefits of (D2) over (D1):
Benefits of (D3) over (D2) - and also over (D1):
There's an important addition to (5): A related point here is that science modules don't need to know whether a given variable comes from (for example) waterstate_type or water_bulk_only_state_type. To see the code for each of these designs, you can check out the corresponding commits on this branch:
I ran designs (D1) and (D3) by @mvertens and @mathewvrothstein yesterday. @mathewvrothstein raised the very valid critique that the extra layers in (D3) - and in particular the use of type extension - might make it hard to understand. On the other hand, they both liked that (D3) allows science routines to explicitly declare that they expect the bulk water state, not just any water state (since the latter could, in principle, be some tracer state, which would not be valid for the science of that routine) - i.e., benefit (4) above. Personally, I actually feel that benefit (5) - which is connected with benefit (3) - will be even more important in terms of making the code easier to write and maintain. In an attempt to keep some of the benefits of (D3) while simplifying the design, I came up with (D5). I feel like this is a reasonable design as long as we don't break up the state/diag/flux types much more: If we go with design (D5), then we won't have containers like I also tried out a different inheritance hierarchy in (D4), but I don't like this as much as (D3) or (D5). This gives benefits (3) and (4), but not benefit (5) – and we further lose some of these benefits if we want to explicitly pass (e.g.) just the state variables into a routine, as opposed to passing the whole waterbulk_inst. My current preferences are for (3) or (5), but I can live with any of these designs – or possibly others that I haven't thought of which other people might see. |
Here are the different options for what the subroutine calls from the driver to science routines could look like under design options (D3) and (D5): ! ========================================================================
! For design (D3) we have two options for how to pass arguments from the driver to the
! science routines.
! ========================================================================
! ------------------------------------------------------------------------
! Option 1: Pass in the whole waterbulk_inst container. This keeps argument lists shorter
! and less volatile, and probably simplifies the initial refactoring. On the other hand,
! it makes it harder to track the flow of individual types (e.g., waterstate_bulk_type)
! through the system, and you can't (for example) declare that state variables are
! intent(in) but diagnostic variables are intent(inout) for a given subroutine.
! ------------------------------------------------------------------------
! Driver call
call canopy_hydrology(..., water_inst%waterbulk_inst, ...)
! Subroutine header
subroutine canopy_hydrology(..., waterbulk_inst, ...)
type(waterbulk_type), intent(inout) :: waterbulk_inst
associate( &
foo => waterbulk_inst%waterstate_bulk_inst%foo, & ! Input
bar => waterbulk_inst%waterdiag_bulk_inst%bar & ! Output
)
end associate
end subroutine canopy_hydrology
! ------------------------------------------------------------------------
! Option 2: Pass in the individual instances from waterbulk_inst. This has the opposite
! pros and cons of option 1.
! ------------------------------------------------------------------------
! Driver call
call canopy_hydrology(..., water_inst%waterbulk_inst%waterstate_bulk_inst, &
water_inst%waterbulk_inst%waterdiag_bulk_inst, ...)
! Subroutine header
subroutine canopy_hydrology(..., waterstate_bulk_inst, waterdiag_bulk_inst, ...)
type(waterstate_bulk_type), intent(in) :: waterstate_bulk_inst
type(waterdiag_bulk_type), intent(inout) :: waterdiag_bulk_inst
associate( &
foo => waterstate_bulk_inst%foo, & ! Input
bar => waterdiag_bulk_inst%bar & ! Output
)
end associate
end subroutine canopy_hydrology
! ========================================================================
! For designs (D1) and (D5), option 1 is not possible. Here is what option 2 would look
! like if we go with design (D5).
! ========================================================================
! Driver call
call canopy_hydrology(..., water_inst%waterstate_bulk_inst, &
water_inst%waterdiag_bulk_inst, ...)
! Subroutine header
! Same as above If people prefer option (2) then there is little benefit to (D3) over (D5). If people prefer option (1) then designs (D1) and (D5) are ruled out. |
Okay, I'm mostly done commenting on this for now. @martynpclark @mvertens @mathewvrothstein and others: I'd welcome any thoughts you have. |
Oh, and if you review this: I don't really recommend looking at diffs in the different commits: instead, I recommend looking at the individual commits, as I mention in #395 (comment) |
As I discussed with @mvertens yesterday: I've been struggling to see how to apply the water isotope / tracer design to more modularized fluxes, where the flux variable is contained by the science module that computes it. An example of this is the irrigation flux in IrrigationMod; there are other examples in the This could be easier to handle if we ditched the idea of handling isotopes and tracers via multiple instances, and instead went back to the idea of having an extra array dimension for the tracers. But this carries its own set of issues – and even then, this would mean that all of the water-related science classes need to know something about the different water tracers, making it harder to write (and read) the code for each science class. One possible solution is to have two types for each science module: e.g., an irrigation_type that contains things that exist for tracers, too; and an irrigation_bulk_type that contains things that are just relevant for the bulk. But this adds complexity to the code. Another solution is to not worry about the fact that some variables are going to be duplicated for tracers when they don't need to be... but I think that's messy (for restart and other reasons). @mvertens and I feel that the best solution is to move fluxes back into the centralized flux types, giving up on the idea of declaring flux variables in the science module responsible for computing the flux. I feel like there was some value in the idea of having the science modules own these fluxes, but not enough value to warrant the extra complexity it will cause in the face of water isotopes / tracers. I'd still like to keep some variables local to the science modules – especially variables that can be made private to those modules. But the fluxes were never handled in a completely (textbook) object-oriented fashion anyway, since they were accessed publicly by other modules, so I'm not too sad to move these public flux variables back out to the generic flux types. And while this has the disadvantage of losing the attachment of a variable to the module that computes that variable, it has some advantages, too, in terms of understanding the suite of water flux variables, and reducing the number of instances that need to get passed all around (e.g., we won't need to pass irrigation_inst to as many places). |
In a discussion with @mvertens and @mathewvrothstein we agreed that (D5) is probably best. I'll make another commit to this branch that prototypes that design, then @mathewvrothstein will move ahead with refactoring the code to implement it. |
This is design (D5) in the pdf in PR ESCOMP#395, which is the favorite of Mariana Vertenstein and Mat Rothstein. In discussion with Mariana and Mat, we felt that the benefits of having these containers weren't worth the extra complexity of having this extra layer. Part of my original motivation for having this extra layer was thinking that we'd eventually have a lot more objects contained in each tracer instance: all of the process-oriented objects like irrigation, infiltration excess runoff, etc. (which each compute one or more water fluxes). But as I noted in ESCOMP#395 (comment), I'm now feeling that won't be feasible, so we'll probably have no more than 5 or 6 objects for each tracer instance (state, diagnostics, water balance, fluxes, and maybe we'll split one or two of those further). So there is less of a need for this tracer type container.
@mathewvrothstein okay I have committed the prototype for (D5). This is ready for you to take over now, I think. |
@martynpclark @swensosc @dlawrenncar - We're planning to go with the design as illustrated by the latest commit in this PR (illustrated by design (D5) in the pdf linked from #395 (comment)). If you're interested in seeing what this looks like, easiest thing to do is probably to check out this branch and look at these files:
Let us know very soon if you have any problems with this, because @mathewvrothstein is planning to start implementing this soon. |
Closing this, since it has served its purpose. |
I had hoped that most variables computed by a given module could be owned by the type defined in that module - i.e., moving more towards "textbook" object-orientation. However, I realized that this won't work well for water fluxes due to the requirements of isotopes and other water tracers: We need multiple instances of each type that contains any fluxes for which we need separate instances for isotopes / tracers, but we don't want separate instances for all variables. (See some discussion in ESCOMP#395 for more details.) I'm even moving fluxes that could otherwise be private to the module in which they're set. This is partly because these private fluxes may at some point need multiple instances for isotopes / tracers (this is the case for the fluxes in GlacierSurfaceMassBalanceMod - the separate melt and freeze fluxes - which seem like they may need to be tracked separately for isotopes... actually, I could envision needing track the melt flux, qflx_glcice_melt, separately by layer for the sake of isotopes), and partly because I don't want to have to change the home of a variable just because some other module now needs it and so it now needs to be public. It's possible that not everything that is called a flux really needs to go in waterflux_type. I'm particularly thinking about partial fluxes - e.g., if we add foo + bar to get the full flux from state1 to state2, then it's possible that we only need to store that sum in waterflux_type, and could keep foo and bar in the science modules that set them. (Specific example: I'm wondering if qflx_sat_excess_surf and qflx_infl_excess should really be considered fluxes, or if these "partial fluxes" can continue to live in their defining science modules.) But I'm not sure about that (it may be that we'll need tracer versions of even these partial fluxes), and even if it's true, a downside would be that this could increase the volatility of where things are declared (e.g., if it used to be that foo was the only flux, and then we added bar later - would we then move foo out of waterflux_type to its defining science module?) So at least for now, I'll move these "partial fluxes" to waterflux_type; I can always move them back once I understand the needs for tracers / isotopes better. Another example of something that may not need to be in waterflux_type is qflx_glcice_dyn_water_flux_col - but it seems possible that we'll need separate isotope versions for this, so I'll move it for now, and we could move it back later once we understand what's needed for isotopes better. I'm leaving non-flux variables in the module that computes them, even if they're public - e.g., qinmax_col and fsat_col. I can see arguments either way for these. For now I'll keep them in place, partly because that's easier than moving them. But I can see a general principle: Fluxes should be contained in waterflux_type, but auxiliary variables (public or private) that are computed by a module should be owned by that module. One justification for this is: For non-water-flux variables that are only referenced by that module, it makes sense to make them owned by that module and private. And we don't want to have to move things around just because now someone else needs them. So where a variable lives should be based less on public/private and more on whether it's a water flux (which needs to be in the water flux type) or something else (which can follow "better" object-oriented design). For water fluxes computed by a routine, I considered passing them in directly as intent(out) arrays rather than passing the whole waterflux_type in, to make it more clear what the effect of the routine is. But the problem I see with passing computed fluxes directly as arrays is the potential for aliasing - i.e., if both waterflux_inst and waterflux_inst%foo are passed to a routine, the latter as an output, and then inside the routine, waterflux_inst%foo is used as an input, in addition to setting foo. A solution could be to pass everything as bare arrays, but this goes against what we have in place right now, and I don't love that idea because, if we have two polymorphic implementations, it makes it harder for them to have different input requirements. So for now I'm sticking with passing in the whole waterflux_type and documenting outputs in the associate block.
I had hoped that most variables computed by a given module could be owned by the type defined in that module - i.e., moving more towards "textbook" object-orientation. However, I realized that this won't work well for water fluxes due to the requirements of isotopes and other water tracers: We need multiple instances of each type that contains any fluxes for which we need separate instances for isotopes / tracers, but we don't want separate instances for all variables. Mariana Vertenstein and I considered a number of solutions to this problem, but in the end decided it's best to just move these water fluxes out of the science modules into waterflux_type, because other solutions we could think of had bigger downsides. I feel like there was some value in the idea of having the science modules own these fluxes, but not enough value to warrant the extra complexity it will cause in the face of water isotopes / tracers. I'd still like to keep some variables local to the science modules – especially variables that can be made private to those modules. But the fluxes were never handled in a completely (textbook) object-oriented fashion anyway, since they were accessed publicly by other modules, so I'm not too sad to move these public flux variables back out to the generic flux types. And while this has the disadvantage of losing the attachment of a variable to the module that computes that variable, it has some advantages, too, in terms of understanding the suite of water flux variables, and reducing the number of instances that need to get passed all around (e.g., we won't need to pass irrigation_inst to as many places). (See ESCOMP#395 (comment) for more details.) I'm even moving fluxes that could otherwise be private to the module in which they're set. This is partly because these private fluxes may at some point need multiple instances for isotopes / tracers (this is the case for the fluxes in GlacierSurfaceMassBalanceMod - the separate melt and freeze fluxes - which seem like they may need to be tracked separately for isotopes... actually, I could envision needing track the melt flux, qflx_glcice_melt, separately by layer for the sake of isotopes), and partly because I don't want to have to change the home of a variable just because some other module now needs it and so it now needs to be public. It's possible that not everything that is called a flux really needs to go in waterflux_type. I'm particularly thinking about partial fluxes - e.g., if we add foo + bar to get the full flux from state1 to state2, then it's possible that we only need to store that sum in waterflux_type, and could keep foo and bar in the science modules that set them. (Specific example: I'm wondering if qflx_sat_excess_surf and qflx_infl_excess should really be considered fluxes, or if these "partial fluxes" can continue to live in their defining science modules.) But I'm not sure about that (it may be that we'll need tracer versions of even these partial fluxes), and even if it's true, a downside would be that this could increase the volatility of where things are declared (e.g., if it used to be that foo was the only flux, and then we added bar later - would we then move foo out of waterflux_type to its defining science module?) So at least for now, I'll move these "partial fluxes" to waterflux_type; I can always move them back once I understand the needs for tracers / isotopes better. Another example of something that may not need to be in waterflux_type is qflx_glcice_dyn_water_flux_col - but it seems possible that we'll need separate isotope versions for this, so I'll move it for now, and we could move it back later once we understand what's needed for isotopes better. I'm leaving non-flux variables in the module that computes them, even if they're public - e.g., qinmax_col and fsat_col. I can see arguments either way for these. For now I'll keep them in place, partly because that's easier than moving them. But I can see a general principle: Fluxes should be contained in waterflux_type, but auxiliary variables (public or private) that are computed by a module should be owned by that module. One justification for this is: For non-water-flux variables that are only referenced by that module, it makes sense to make them owned by that module and private. And we don't want to have to move things around just because now someone else needs them. So where a variable lives should be based less on public/private and more on whether it's a water flux (which needs to be in the water flux type) or something else (which can follow "better" object-oriented design). For water fluxes computed by a routine, I considered passing them in directly as intent(out) arrays rather than passing the whole waterflux_type in, to make it more clear what the effect of the routine is. But the problem I see with passing computed fluxes directly as arrays is the potential for aliasing - i.e., if both waterflux_inst and waterflux_inst%foo are passed to a routine, the latter as an output, and then inside the routine, waterflux_inst%foo is used as an input, in addition to setting foo. A solution could be to pass everything as bare arrays, but this goes against what we have in place right now, and I don't love that idea because, if we have two polymorphic implementations, it makes it harder for them to have different input requirements. So for now I'm sticking with passing in the whole waterflux_type and documenting outputs in the associate block.
This follows the design laid out in ESCOMP#395. This will be particularly valuable when we introduce tracer instances: the logic related to number of tracers can be encapsulated in water_type, rather than infiltrating clm_instMod.
Rework water data types to accommodate isotopes and other tracers This tag reworks the various water data types to allow having multiple instances of variables that are needed for isotopes and other water tracers. Specific changes include: (1) Separated "water state" variables into state, diagnostic and balance check-related variables. This separation was not essential for the work here, but was desired by Martyn Clark and others. (2) For each of water state, diagnostic and flux variables, separated variables into those needed for both bulk and tracers vs. those only needed for bulk. This way, we can have multiple instances of the variables needed by tracers, but only a single instance of variables that only apply to bulk water. This follows the design laid out in #395. The separation was based largely on what was done in the old water isotope branch; we didn't put a lot of thought into this, because the new design allows us to easily migrate variables between bulk-only and bulk-and-tracer as needed. (3) Moved water fluxes that were defined in science modules back into waterflux_type or waterfluxbulk_type. This was needed for (2); there is more discussion on this in #395 and the log message for commit 711e5cd. (4) Introduced a top-level water_type that holds instances of all of the other water-related objects. This follows the design laid out in #395. This is particularly valuable for the tracer instances: the logic related to number of tracers can be encapsulated in water_type, rather than infiltrating clm_instMod. (5) Added placeholders for water tracer instances (6) Added infrastructure to generate history / restart field names for the tracer instances. Eventually, the isotope class can also hold information specific to each isotope. This work was a joint effort between Mathew Rothstein and myself; Mat gets much of the credit for the actual refactoring done here. Issues fixed (include CTSM Issue #): - Fixes #358 (Separate WaterStateType into multiple types) - Fixes #434 (Separate WaterFluxType into a base class and a class that just applies to bulk) - Fixes #359 (Set up infrastructure for multiple instances of WaterState and WaterFlux types) - Fixes #458 (Implement handling of history and restart variables for water tracers)
I had hoped that most variables computed by a given module could be owned by the type defined in that module - i.e., moving more towards "textbook" object-orientation. However, I realized that this won't work well for water fluxes due to the requirements of isotopes and other water tracers: We need multiple instances of each type that contains any fluxes for which we need separate instances for isotopes / tracers, but we don't want separate instances for all variables. Mariana Vertenstein and I considered a number of solutions to this problem, but in the end decided it's best to just move these water fluxes out of the science modules into waterflux_type, because other solutions we could think of had bigger downsides. I feel like there was some value in the idea of having the science modules own these fluxes, but not enough value to warrant the extra complexity it will cause in the face of water isotopes / tracers. I'd still like to keep some variables local to the science modules – especially variables that can be made private to those modules. But the fluxes were never handled in a completely (textbook) object-oriented fashion anyway, since they were accessed publicly by other modules, so I'm not too sad to move these public flux variables back out to the generic flux types. And while this has the disadvantage of losing the attachment of a variable to the module that computes that variable, it has some advantages, too, in terms of understanding the suite of water flux variables, and reducing the number of instances that need to get passed all around (e.g., we won't need to pass irrigation_inst to as many places). (See ESCOMP#395 (comment) for more details.) I'm even moving fluxes that could otherwise be private to the module in which they're set. This is partly because these private fluxes may at some point need multiple instances for isotopes / tracers (this is the case for the fluxes in GlacierSurfaceMassBalanceMod - the separate melt and freeze fluxes - which seem like they may need to be tracked separately for isotopes... actually, I could envision needing track the melt flux, qflx_glcice_melt, separately by layer for the sake of isotopes), and partly because I don't want to have to change the home of a variable just because some other module now needs it and so it now needs to be public. It's possible that not everything that is called a flux really needs to go in waterflux_type. I'm particularly thinking about partial fluxes - e.g., if we add foo + bar to get the full flux from state1 to state2, then it's possible that we only need to store that sum in waterflux_type, and could keep foo and bar in the science modules that set them. (Specific example: I'm wondering if qflx_sat_excess_surf and qflx_infl_excess should really be considered fluxes, or if these "partial fluxes" can continue to live in their defining science modules.) But I'm not sure about that (it may be that we'll need tracer versions of even these partial fluxes), and even if it's true, a downside would be that this could increase the volatility of where things are declared (e.g., if it used to be that foo was the only flux, and then we added bar later - would we then move foo out of waterflux_type to its defining science module?) So at least for now, I'll move these "partial fluxes" to waterflux_type; I can always move them back once I understand the needs for tracers / isotopes better. Another example of something that may not need to be in waterflux_type is qflx_glcice_dyn_water_flux_col - but it seems possible that we'll need separate isotope versions for this, so I'll move it for now, and we could move it back later once we understand what's needed for isotopes better. I'm leaving non-flux variables in the module that computes them, even if they're public - e.g., qinmax_col and fsat_col. I can see arguments either way for these. For now I'll keep them in place, partly because that's easier than moving them. But I can see a general principle: Fluxes should be contained in waterflux_type, but auxiliary variables (public or private) that are computed by a module should be owned by that module. One justification for this is: For non-water-flux variables that are only referenced by that module, it makes sense to make them owned by that module and private. And we don't want to have to move things around just because now someone else needs them. So where a variable lives should be based less on public/private and more on whether it's a water flux (which needs to be in the water flux type) or something else (which can follow "better" object-oriented design). For water fluxes computed by a routine, I considered passing them in directly as intent(out) arrays rather than passing the whole waterflux_type in, to make it more clear what the effect of the routine is. But the problem I see with passing computed fluxes directly as arrays is the potential for aliasing - i.e., if both waterflux_inst and waterflux_inst%foo are passed to a routine, the latter as an output, and then inside the routine, waterflux_inst%foo is used as an input, in addition to setting foo. A solution could be to pass everything as bare arrays, but this goes against what we have in place right now, and I don't love that idea because, if we have two polymorphic implementations, it makes it harder for them to have different input requirements. So for now I'm sticking with passing in the whole waterflux_type and documenting outputs in the associate block.
This follows the design laid out in ESCOMP#395. This will be particularly valuable when we introduce tracer instances: the logic related to number of tracers can be encapsulated in water_type, rather than infiltrating clm_instMod.
Rework water data types to accommodate isotopes and other tracers This tag reworks the various water data types to allow having multiple instances of variables that are needed for isotopes and other water tracers. Specific changes include: (1) Separated "water state" variables into state, diagnostic and balance check-related variables. This separation was not essential for the work here, but was desired by Martyn Clark and others. (2) For each of water state, diagnostic and flux variables, separated variables into those needed for both bulk and tracers vs. those only needed for bulk. This way, we can have multiple instances of the variables needed by tracers, but only a single instance of variables that only apply to bulk water. This follows the design laid out in ESCOMP#395. The separation was based largely on what was done in the old water isotope branch; we didn't put a lot of thought into this, because the new design allows us to easily migrate variables between bulk-only and bulk-and-tracer as needed. (3) Moved water fluxes that were defined in science modules back into waterflux_type or waterfluxbulk_type. This was needed for (2); there is more discussion on this in ESCOMP#395 and the log message for commit 711e5cd. (4) Introduced a top-level water_type that holds instances of all of the other water-related objects. This follows the design laid out in ESCOMP#395. This is particularly valuable for the tracer instances: the logic related to number of tracers can be encapsulated in water_type, rather than infiltrating clm_instMod. (5) Added placeholders for water tracer instances (6) Added infrastructure to generate history / restart field names for the tracer instances. Eventually, the isotope class can also hold information specific to each isotope. This work was a joint effort between Mathew Rothstein and myself; Mat gets much of the credit for the actual refactoring done here. Issues fixed (include CTSM Issue #): - Fixes ESCOMP#358 (Separate WaterStateType into multiple types) - Fixes ESCOMP#434 (Separate WaterFluxType into a base class and a class that just applies to bulk) - Fixes ESCOMP#359 (Set up infrastructure for multiple instances of WaterState and WaterFlux types) - Fixes ESCOMP#458 (Implement handling of history and restart variables for water tracers)
Fixes: growth and allometry
This PR is not meant to be merged: it is for discussion only.
This records some (evolving) thoughts on how to handle the separation of WaterStateType for water isotopes and water tracers in general (#358).