-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
86 sensitivity analysis sampling #97
Conversation
…-design-parameters' into 86-sensitivity-analysis-sampling
not necessarily a deal breaker... made negative cycles a warning
- log metric completeion - help merge for data align - nse handle invliad
update nx version to avoid some weighted shortest path errors
This reverts commit c11a10a.
further help aligning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks ok, but I think it has a few aspects to polish.
bound = [parameters[parameter]['minimum'], | ||
parameters[parameter]['maximum']] | ||
|
||
names.append(parameter) | ||
bounds.append(bound) | ||
dists.append(parameters[parameter].get('dist', 'unif')) | ||
groups.append(parameters[parameter]['category']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these parameter
, maximum
, minimum
, etc. guarantee to exist in the parameters dictionary and inside? If not, I would use get
with a default value or handle the possibility of it failing with some custom error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should do because of the use of pydantic
in parameters.py
, though admittedly there are some 'unsampleable' parameters in there - so I have added the following in config
validation.
Here:
# Check that the parameter is sample-able
required_attrs = set(['minimum', 'maximum', 'default', 'category'])
correct_attrs = required_attrs.intersection(params[param])
missing_attrs = required_attrs.difference(correct_attrs)
if any(missing_attrs):
raise ValueError(f"{param} missing {missing_attrs} so cannot be sampled.")
|
||
if N is None: | ||
N = 2 ** (problem['num_vars'] - 1) | ||
problem_ = problem.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you creating the problem object and then copying it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If groups
is False
, then I need to delete the groups
key to pass to SALib
, but I still want to retain the groups information. I've improved the commenting to explain
swmmanywhere/paper/experimenter.py
Outdated
for x,y,z in zip(problem['groups'], | ||
problem['names'], | ||
params): | ||
X.append({'param' : y, | ||
'value' : z, | ||
'iter' : ix, | ||
'group' : x}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, you removed groups
if sampling via groups was not required, but here you are using them again. I think the workflow of this function needs to be clarified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I see it was confusing - hopefully the new comments clarify?
swmmanywhere/paper/experimenter.py
Outdated
# Iterate over the samples, running the model when the jobid matches the | ||
# processor number | ||
for ix, params_ in gb: | ||
if ix % nproc != jobid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this. You are generating a lot of samples but then running the problem just every jobid
number of them. What's the rationale for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for easy use with a job array, jobid will be different on different processors - happy if there's a better way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By job array, do you mean Slurm's?
I don't also quite follow the logic here. You have N samples, and you want to submit a job for each? I am not seeing how mod operator does this.
Based on my understanding of what you're trying to achieve, IMHO, this is better:
job_iter = tlz.partition_all(nproc, range(len(X)))
for _ in range(jobid + 1):
job_idx = next(job_iter, None)
if job_idx is None:
print('No job to do')
return
config = config_base.copy()
for ix in job_idx:
params = gb.get_group(ix)
...
Each Slurm job does a partition of the samples.
params = parameters.get_full_parameters_flat() | ||
for param in config.get('parameters_to_sample',{}): | ||
if isinstance(param, dict): | ||
param = list(param.keys())[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there's more than one key? Are the rest of them not relevant? How do you know the first one is the one that matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it clear with this:
Here
# If the parameter is a dictionary, the values are bounds, all we are
# checking here is that the parameter exists, we only need the first
# entry.
if isinstance(param, dict):
if len(param) > 1:
raise ValueError("""If providing new bounds in the config, a dict
of len 1 is required, where the key is the
parameter to change and the values are
(new_lower_bound, new_upper_bound).""")
param = list(param.keys())[0]
Co-authored-by: Diego Alonso Álvarez <[email protected]>
…/ImperialCollegeLondon/SWMManywhere into 86-sensitivity-analysis-sampling
use eps for float issues
swmmanywhere/paper/experimenter.py
Outdated
# Iterate over the samples, running the model when the jobid matches the | ||
# processor number | ||
for ix, params_ in gb: | ||
if ix % nproc != jobid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By job array, do you mean Slurm's?
I don't also quite follow the logic here. You have N samples, and you want to submit a job for each? I am not seeing how mod operator does this.
Based on my understanding of what you're trying to achieve, IMHO, this is better:
job_iter = tlz.partition_all(nproc, range(len(X)))
for _ in range(jobid + 1):
job_idx = next(job_iter, None)
if job_idx is None:
print('No job to do')
return
config = config_base.copy()
for ix in job_idx:
params = gb.get_group(ix)
...
Each Slurm job does a partition of the samples.
fix typing use np.inf/isfinite
Use itertuples rather than iterrows
from pathlib import Path | ||
|
||
import pandas as pd | ||
import toolz as tlz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of toolz
use its C version since it's the same package (same developers), written in Cython. So, this becomes: import cytoolz.curried as tlz
.
Description
Adjustments to config file, and new module to run sensitivity analysis.
Fixes #86
Summary of changes:
model_number
to be given to SWMManywhereexperimenter.py
to sample, run, evaluate and save for a sensitivity analysis defined in a config file.submit_icl_example
- an example jobarray submission script for ICL HPC.