Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do when no variance in weight bounds #218

Merged
merged 5 commits into from
Jun 19, 2024
Merged

Conversation

barneydobson
Copy link
Collaborator

Description

fix and test bug

Fixes #215

@barneydobson barneydobson linked an issue Jun 13, 2024 that may be closed by this pull request
@cheginit
Copy link
Collaborator

Can you please give a bit more context for this issue? Why/when this happens?

@barneydobson
Copy link
Collaborator Author

barneydobson commented Jun 13, 2024

If you are synthesising a small network, you might end up with, e.g., all edges with the same slope (this is more likely than it seems because of the method we use to penalise different slopes - see https://doi.org/10.1016/j.compenvurbsys.2019.101370) - I guess even if that wasn't the case, it is entirely plausible that other types of weights will be added in the future which may also suffer this. Weight's are all normalised between 0 and 1 - so if we have all the same slope the normalisation will throw a divide by 0

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a comment in relation to the tests - which also affect other tests. Otherwise, it this works, that's OK.

In my experience, this manipulation is tricky and might end up in unexpected results since dividing by something really small will result in something very big, but still meaningful. It might be more robust to address the issue where the division by zero is taking place.

tests/test_graph_utilities.py Outdated Show resolved Hide resolved
swmmanywhere/graph_utilities.py Outdated Show resolved Hide resolved
@cheginit
Copy link
Collaborator

In graph theory, it's common practice to set edge weights to values higher than 1. It depends on the application, but in this case, when weight doesn't carry specific physical meaning, and it's just a cost measure in path traversal, why normalize between 0-1 and not 1-5, or something like that. It doesn't even have to be a float, it can be an integer that acts as a classifier. So, as @dalonsoa said, it's better to prevent this numerical issue from occurring instead of ad-hoc solutions.

@barneydobson
Copy link
Collaborator Author

barneydobson commented Jun 17, 2024

@cheginit I agree that makes sense to normalise between 1-5 or >0 anyway. Though it doesn't solve this issue because if there is no variation in the upper and lower data values we would still divide by 0. (now an issue #221 )

@barneydobson
Copy link
Collaborator Author

@dalonsoa I guess you are suggesting to make the check here:

weight = max((d[attr] - bds[0]) / (bds[1] - bds[0]), eps)

Though the bds[1] - bds[0] does not need to be recalculated for every edge - which is what would happen if we did this.

Actually perhaps the tidiest alternative would be that if they are the same to just drop that weight from the graph - to be honest this seems sensible so I have updated the code to reflect it

@cheginit
Copy link
Collaborator

Right, my main point is that normalizing weight is not as straight forward as just using a min-max normalizer. Since the weight is not a physical property, you'd have to be careful about how does the normalizer interpret different values. In my methodology, I opted for a discrete normalizer, so I can clearly assign a physical meaning to weight. For example, for slope, we can use low, mild, and steep as descriptors for physical interpretation, then based on a pre-defined range bin the values and assign weights of 1, 2, and 3. This way, when the slope difference is in the order 1e-3, values might fall into the same category.

@barneydobson
Copy link
Collaborator Author

barneydobson commented Jun 18, 2024

Right, my main point is that normalizing weight is not as straight forward as just using a min-max normalizer. Since the weight is not a physical property, you'd have to be careful about how does the normalizer interpret different values. In my methodology, I opted for a discrete normalizer, so I can clearly assign a physical meaning to weight. For example, for slope, we can use low, mild, and steep as descriptors for physical interpretation, then based on a pre-defined range bin the values and assign weights of 1, 2, and 3. This way, when the slope difference is in the order 1e-3, values might fall into the same category.

Is that not exactly what is being done in this paper - albeit with continuous numerical values rather than discrete categories? All the slopes that are between - I mean I can't quite tell, but say 0.5% and 2% are assigned a weight of 0 and thus treated equally.

Regardless though - I'll copy your comment into the other issue: #221 . Even using discrete weights the code still needs to know how to handle no variation in base data used for weight calculation - which is what this issue handles. And I think now the way it's done following the edits to this PR (i.e., don't include that weight in the shortest path calculation) - makes sense.

@cheginit
Copy link
Collaborator

You're having issue with normalizing data and weight calculation, and I am suggesting using a more robust and physically based normalization approach than a simple min-max normalizer. If you prefer using min-max normalizer and think it gives you good results as a general solution, that's fine.

@barneydobson
Copy link
Collaborator Author

You're having issue with normalizing data and weight calculation, and I am suggesting using a more robust and physically based normalization approach than a simple min-max normalizer. If you prefer using min-max normalizer and think it gives you good results as a general solution, that's fine.

Don't worry - it's on the radar #221 - I just don't have time to solve everything for the paper

@barneydobson barneydobson merged commit ded19ab into main Jun 19, 2024
8 checks passed
@barneydobson barneydobson deleted the nobnds_variance branch June 19, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

upper = lower bound in weights
3 participants