Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal carbon pool model #134

Merged
merged 51 commits into from
Feb 7, 2023
Merged

Minimal carbon pool model #134

merged 51 commits into from
Feb 7, 2023

Conversation

jacobcook1995
Copy link
Collaborator

Description

I've added a basic carbon pool to the soil module. It only includes 2 pools (the full model will probably include 5 pools), and a function that handles transfers between these two pools. This is all bundled together as part of a SoilCarbon class.

I was originally intending to implement an alternate equation form and a method to switch between the two, but decided that was a bit ambitious for a single pull request. Structurally there's a couple of things that I'm not completely happy with and would want to clear up before I moved onto something like that.

  1. The constant dictionaries defined at the top of the script define a set of fitting parameters for the various equations, these are taken from fits to data carried out in the literature. I'm not sure if this is a good way to store this kind of information, but it didn't seem sensible either to define these fitting parameters as independant constants either. I'm also unsure whether treating these as constants is a sensible approach either as I could see us wanting to do sensitivity analysis with them down the line.
  2. I wrote the code with no sanity checking for variables except when the class is initialised. My reasoning for this is that too common sanity checking would slow down execution, and that we probably don't want to simulation to halt entirely because a negative value has crept into one differential equation in one grid cell. Is this broadly sensible?

No hurry on reviewing this, I mainly wanted to get the pull request submitted before I finished for Christmas.

Fixes #99

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

Copy link
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good and it's also a good example of how a PR should be done (i.e. you provided a good explanation, and your code has tests and lots of docstrings). Well done!

I've made some small suggestions throughout. Some slightly "bigger picture suggestions I had were:

  1. It'd be helpful to say what units each of your variables are throughout (in docstrings etc.) where they represent physical values (e.g. I'd assume all your temperatures are in °C but would like to know for sure, I don't know what the unit of soil moisture is at all etc.)
  2. While it's great that you have tests for each of your functions, you're oftentimes only testing that it works for a single set of parameters. This is much better than having nothing at all, but I think you should go one step further and test multiple sets or, ideally, ranges of values (i.e. by using @pytest.mark.parametrize). I was working on something similar myself today: https://github.com/ImperialCollegeLondon/FINESSE/blob/main/tests/test_tc4820.py

virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/soil/carbon.py Outdated Show resolved Hide resolved
@alexdewar
Copy link
Collaborator

I realise I forgot to answer your questions:

  1. Seems sensible to me!
  2. I would just add the sanity checks. In the grand scheme of things it shouldn't hurt performance much and it could potentially save you a lot of debugging pain further down the line.

@davidorme
Copy link
Collaborator

On question 2, I think this overlaps with #135 - I've added a sketch of how this might work across modules to that discussion.

@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2022

Codecov Report

Merging #134 (8d985d2) into develop (9659a15) will increase coverage by 0.35%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #134      +/-   ##
===========================================
+ Coverage    94.95%   95.31%   +0.35%     
===========================================
  Files           14       15       +1     
  Lines          714      768      +54     
===========================================
+ Hits           678      732      +54     
  Misses          36       36              
Impacted Files Coverage Δ
virtual_rainforest/core/config.py 98.23% <ø> (ø)
virtual_rainforest/models/plants/__init__.py 100.00% <ø> (ø)
virtual_rainforest/models/soil/__init__.py 100.00% <ø> (ø)
virtual_rainforest/models/soil/model.py 94.73% <ø> (ø)
virtual_rainforest/__init__.py 100.00% <100.00%> (ø)
virtual_rainforest/models/soil/carbon.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - some suggestions on the tests and some possible tweaks on the implementation, but nothing serious. I think the tests are worth updating, so requesting those changes.

tests/test_soil_carbon.py Outdated Show resolved Hide resolved
tests/test_soil_carbon.py Outdated Show resolved Hide resolved
tests/test_soil_carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but a few things to look at.

tests/test_soil_carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some very minor comments, but otherwise I think this is good to go. Good job!

tests/test_soil_carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
virtual_rainforest/models/soil/carbon.py Outdated Show resolved Hide resolved
labile carbon (m^3 kg^-1)
"""

return 10.0 ** (BINDING_WITH_PH["slope"] * pH + BINDING_WITH_PH["intercept"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if I said this in my original review, but I think BINDING_WITH_PH would be better as a dataclass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right is that the case for everywhere that I'm using dictionaries of constants? Or is this specific to BINDING_WITH_PH?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the class would be defined like this

@dataclass
class BindingWithPH:
    """From linear regression (Mayes et al. (2012))."""
    intercept: float = -0.186
    slope: float = -0.216

and accessed like this?

return 10.0 ** (BindingWithPH.slope * pH + BindingWithPH.intercept)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is just the object.thing notation then there are other options for having dot style dictionaries (dotmap for example). But @alexdewar did you mean that BindingWithPH would also provide BindingWithPH.calc_binding?

@dataclass
class BindingWithPH:
    """From linear regression (Mayes et al. (2012))."""
    intercept: float = -0.186
    slope: float = -0.216
    
    def calc_binding(ph: NDArray) -> NDArray
        return 10.0 ** (self.slope * pH + self.intercept)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, I think there's an argument for the constants module providing something that can be easily serialised to JSON or YAML so that the constants can become part of the configuration if needed. core.constants might provide a default set, but people might want to play around with the settings.

I've used dataclasses for this kind of role here: https://github.com/davidorme/pyrealm/blob/master/pyrealm/param_classes.py

I will accept that the size of the dataclasses may not be ideal 😬

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobcook1995 Yes, that's essentially what I had in mind, but I'm open to suggestions if something like dotmap (which I hadn't heard of till now) is better.

The reason I suggested a dataclass was so you can get better type linting (i.e. mypy will be able to see what fields your BindingWithPH class has and what types they are). You could also just have them as separate variables.

virtual_rainforest/models/soil/carbon.py Show resolved Hide resolved
@jacobcook1995
Copy link
Collaborator Author

Made the decision to switch to data classes rather than dot map as I could work out how to add parameter specific docstrings more easily with data classes. Happy for this to change down the line when we agree upon a consistent approach between modules for constants units etc

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Obvious caveats about how we are going to do constants and possibly having some kind of shared utility for bounds checking and if so where.

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. I think there's something off about using dataclasses like that.

@davidorme
Copy link
Collaborator

So I may be not understanding how dataclasses are used, but I've always seen them used to create an instance of the dataclass. So given:

@dataclass
class BindingWithPH:
    """From linear regression (Mayes et al. (2012))."""
    intercept: float = -0.186
    slope: float = -0.216

Then a function might work like this:

def calculate_binding_coefficient(pH: NDArray[np.float32], coef: BindingWithPH = BindingWithPH())

And then the code can use coef.slope, etc. You could then run that calculation with the default values or something else:

res = calculate_binding_coefficient(pH_array, BindingWithPH(slope=-0.23))

Here, we're using the attributes more like class attributes. The dataclass does expose those attributes but they seem to fundamentally not do what I expect. So for example:

In [40]: @dataclass
    ...: class BindingWithPH:
    ...:     """From linear regression (Mayes et al. (2012))."""
    ...:     intercept: float = -0.186
    ...:     slope: float = -0.216
    ...: 

# create a normal instance

In [41]: inst  = BindingWithPH()
In [42]: inst
Out[42]: BindingWithPH(intercept=-0.186, slope=-0.216)

# and with a different slope
In [43]: inst  = BindingWithPH(slope=3)
In [44]: inst
Out[44]: BindingWithPH(intercept=-0.186, slope=3)

# The value _can_ be accessed directly from the class, not an instance
# as if it is a class attribute

In [45]: BindingWithPH.slope
Out[45]: -0.216

# And it can be changed too

In [46]: BindingWithPH.slope = 3
In [47]: BindingWithPH.slope
Out[47]: 3

# BUT that doesn't interact with the instance __init__, which keeps
# returning the original defaults.

In [48]: inst  = BindingWithPH()
In [49]: inst
Out[49]: BindingWithPH(intercept=-0.186, slope=-0.216)
In [50]: BindingWithPH.__init__
Out[50]: <function __main__.BindingWithPH.__init__(self, intercept: float = -0.186, slope: float = -0.216) -> None>

I'm not sure what the advantage of dataclass over a namedtuple is here - we're not using instances of it.

I'm also wary of how users change these values. I'm not sure, but at the moment I think you could do it by changing the "class attribute" value as above, but you then have absolutely no control over what other functions might be accessing that altered value. If an instance is created, with defaults or altered, then the user knows what version they are passing.

This all seems really familiar for some reason!

@alexdewar
Copy link
Collaborator

Yeah, I guess a namedtuple would be another way to go. Alternatively you could just have two separate constants defined at the module level.

I hear your concerns about the values being mutable, but that is kind of what you get with Python. Module-level constants are mutable too.

@davidorme
Copy link
Collaborator

I think dataclasses are probably the way to go - see the discussion here #162 - and I agree about the mutable being Python for you. I have used frozen dataclasses over in pyrealm, which is probably just unnecessary paranoia. Well, actually, not paranoia because someone will do it, but stopping people changing key constants half way through an analysis should be a job for common sense, not code.

@jacobcook1995
Copy link
Collaborator Author

So the suggestion here @davidorme would be to keep the classes as they are but move them to a models/soil/constants.py script so that all model data classes for the module are stored in a single location? And then these would be imported in and supplied as a default value for functions that use them?

@davidorme
Copy link
Collaborator

Yes - I think so. That's very close to what you have already and seems like a sensible way to organise things anyway. Something else might come of #162 but no alarm bells yet!

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jacobcook1995 jacobcook1995 merged commit 7f445d9 into develop Feb 7, 2023
@jacobcook1995 jacobcook1995 deleted the feature/dummy_carbon branch February 7, 2023 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Demo soil model
5 participants