Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

86 sensitivity analysis sampling #97

Merged
merged 72 commits into from
Apr 1, 2024

Conversation

barneydobson
Copy link
Collaborator

@barneydobson barneydobson commented Mar 16, 2024

Description

Adjustments to config file, and new module to run sensitivity analysis.

Fixes #86

Summary of changes:

  • Enable model_number to be given to SWMManywhere
  • Enable setting of parameters to sample and number (via SALib magnitude) of samples.
  • Update to a design storm (for now, see Precipitation events #55 ) - this is the majority of lines removed.
  • Debug small errors revealed from testing.
  • The current shortest path method seems to accommodate negative cycles (since it is heap based), so it is changed to warning.
  • Log completed metrics
  • Improve alignment of results in metrics
  • Add experimenter.py to sample, run, evaluate and save for a sensitivity analysis defined in a config file.
  • Add submit_icl_example - an example jobarray submission script for ICL HPC.
  • Add some tests

@barneydobson barneydobson linked an issue Mar 16, 2024 that may be closed by this pull request
@barneydobson barneydobson self-assigned this Mar 16, 2024
@barneydobson barneydobson added the sa_paper Sensitivity analysis paper label Mar 16, 2024
Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks ok, but I think it has a few aspects to polish.

swmmanywhere/graph_utilities.py Outdated Show resolved Hide resolved
swmmanywhere/paper/experimenter.py Outdated Show resolved Hide resolved
Comment on lines +47 to +53
bound = [parameters[parameter]['minimum'],
parameters[parameter]['maximum']]

names.append(parameter)
bounds.append(bound)
dists.append(parameters[parameter].get('dist', 'unif'))
groups.append(parameters[parameter]['category'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these parameter, maximum, minimum, etc. guarantee to exist in the parameters dictionary and inside? If not, I would use get with a default value or handle the possibility of it failing with some custom error message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should do because of the use of pydantic in parameters.py, though admittedly there are some 'unsampleable' parameters in there - so I have added the following in config validation.

Here:

# Check that the parameter is sample-able
required_attrs = set(['minimum', 'maximum', 'default', 'category'])
correct_attrs = required_attrs.intersection(params[param])
missing_attrs = required_attrs.difference(correct_attrs)
if any(missing_attrs):
    raise ValueError(f"{param} missing {missing_attrs} so cannot be sampled.")


if N is None:
N = 2 ** (problem['num_vars'] - 1)
problem_ = problem.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you creating the problem object and then copying it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If groups is False, then I need to delete the groups key to pass to SALib, but I still want to retain the groups information. I've improved the commenting to explain

Comment on lines 94 to 100
for x,y,z in zip(problem['groups'],
problem['names'],
params):
X.append({'param' : y,
'value' : z,
'iter' : ix,
'group' : x})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, you removed groups if sampling via groups was not required, but here you are using them again. I think the workflow of this function needs to be clarified.

Copy link
Collaborator Author

@barneydobson barneydobson Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I see it was confusing - hopefully the new comments clarify?

swmmanywhere/paper/experimenter.py Outdated Show resolved Hide resolved
swmmanywhere/paper/experimenter.py Outdated Show resolved Hide resolved
# Iterate over the samples, running the model when the jobid matches the
# processor number
for ix, params_ in gb:
if ix % nproc != jobid:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this. You are generating a lot of samples but then running the problem just every jobid number of them. What's the rationale for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for easy use with a job array, jobid will be different on different processors - happy if there's a better way!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By job array, do you mean Slurm's?

I don't also quite follow the logic here. You have N samples, and you want to submit a job for each? I am not seeing how mod operator does this.

Based on my understanding of what you're trying to achieve, IMHO, this is better:

job_iter = tlz.partition_all(nproc, range(len(X)))
for _ in range(jobid + 1):
    job_idx = next(job_iter, None)

if job_idx is None:
    print('No job to do')
    return

config = config_base.copy()
for ix in job_idx:
    params = gb.get_group(ix)
    ...

Each Slurm job does a partition of the samples.

params = parameters.get_full_parameters_flat()
for param in config.get('parameters_to_sample',{}):
if isinstance(param, dict):
param = list(param.keys())[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there's more than one key? Are the rest of them not relevant? How do you know the first one is the one that matters?

Copy link
Collaborator Author

@barneydobson barneydobson Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it clear with this:
Here

# If the parameter is a dictionary, the values are bounds, all we are 
# checking here is that the parameter exists, we only need the first 
# entry.
if isinstance(param, dict):
    if len(param) > 1:
        raise ValueError("""If providing new bounds in the config, a dict 
                            of len 1 is required, where the key is the 
                            parameter to change and the values are 
                            (new_lower_bound, new_upper_bound).""")
    param = list(param.keys())[0]

swmmanywhere/graph_utilities.py Outdated Show resolved Hide resolved
swmmanywhere/metric_utilities.py Show resolved Hide resolved
swmmanywhere/metric_utilities.py Outdated Show resolved Hide resolved
swmmanywhere/metric_utilities.py Outdated Show resolved Hide resolved
swmmanywhere/metric_utilities.py Outdated Show resolved Hide resolved
swmmanywhere/paper/experimenter.py Show resolved Hide resolved
swmmanywhere/paper/experimenter.py Outdated Show resolved Hide resolved
swmmanywhere/paper/experimenter.py Outdated Show resolved Hide resolved
# Iterate over the samples, running the model when the jobid matches the
# processor number
for ix, params_ in gb:
if ix % nproc != jobid:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By job array, do you mean Slurm's?

I don't also quite follow the logic here. You have N samples, and you want to submit a job for each? I am not seeing how mod operator does this.

Based on my understanding of what you're trying to achieve, IMHO, this is better:

job_iter = tlz.partition_all(nproc, range(len(X)))
for _ in range(jobid + 1):
    job_idx = next(job_iter, None)

if job_idx is None:
    print('No job to do')
    return

config = config_base.copy()
for ix in job_idx:
    params = gb.get_group(ix)
    ...

Each Slurm job does a partition of the samples.

swmmanywhere/paper/experimenter.py Outdated Show resolved Hide resolved
from pathlib import Path

import pandas as pd
import toolz as tlz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of toolz use its C version since it's the same package (same developers), written in Cython. So, this becomes: import cytoolz.curried as tlz.

@barneydobson barneydobson merged commit 25b6b05 into main Apr 1, 2024
10 checks passed
@barneydobson barneydobson deleted the 86-sensitivity-analysis-sampling branch April 1, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sa_paper Sensitivity analysis paper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sensitivity analysis sampling
3 participants