-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histograms: Fix rounding and improve format #4
Comments
Regardless of what we use as the labels, we need to compute which bin each group belongs to. This is currently done by rounding, which causes the problem. Replacing Regarding the labels, @nunesgh I think you can use tuples for the intervals, and (this may depend on the Pandas version, but I'm using an old one, 1.5.3) you don't need to convert the dictionary to string: >>> df = pd.DataFrame({"histogram": [ { (1, 2): 0.8, (2, 3): 0.2 }, { (1, 2): 0.3, (2, 3): 0.7 } ]})
>>> df.columns
Index(['histogram'], dtype='object')
>>> df
histogram
0 {(1, 2): 0.8, (2, 3): 0.2}
1 {(1, 2): 0.3, (2, 3): 0.7}
>>> df.loc[0, "histogram"]
{(1, 2): 0.8, (2, 3): 0.2}
>>> type(df.loc[0, "histogram"])
<class 'dict'> |
I still don't understand why it's necessary to round (or ceiling) the probabilities instead of only using the original value itself and place it in the proper bin. |
And how would we do that without computing the index of the bin? We could traverse all the bins every time and compare the probability against the intervals, but it's worst in terms of performance and it doesn't solve the floating-point error. |
If there are 100 bins, where bin 1 is |
You're still rounding, but down instead (or, more precisely, truncating). And, it doesn't work the way we want. Say |
I'm sorry, you're right. It seems that using ceiling is more appropriate. So the bin index would be |
That's right! Take a look at the changes in #5. |
Problem: The histogram provided by the package (for all attacks) is rounding probabilities with <0.5% to 0%, but for any attack the probability can never be 0. The probabilities should not be rounded.
Improvement: Each probability is being rounding before generating the histogram. The histogram could be given without any rounding in the following way:
As the histogram is already a dictionary, it could be given with the interval of bins as the labels. For example:
The text was updated successfully, but these errors were encountered: