Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing decimal symbol test failing #259

Open
jagerber48 opened this issue Jul 27, 2024 · 3 comments
Open

Trailing decimal symbol test failing #259

jagerber48 opened this issue Jul 27, 2024 · 3 comments

Comments

@jagerber48
Copy link
Contributor

#167 refactors the formatting tests to resolve #162, a bug where shadowed dictionary keys caused many previously written tests to not run. When these tests ran a few failures were discovered. The only unresolved failure was on the following test (abbreviated here):

assert format(ufloat(1234.56789, 0.1), ".0f") == "1235+/-0."

This fails because the formatting gives

print(format(ufloat(1234.56789, 0.1), ".0f"))
# 1235+/-0

The tests expects a trailing decimal symbol on the zero, but the actual output has no trailing decimal symbol.

The RTD documentation claims that "An uncertainty which is exactly zero is always formatted as an integer". To me this implied that an uncertainty which is rounded to zero, but which was originally non-zero, would always have a trailing decimal symbol. This is the case on another test which is passing now:

assert format(ufloat(9.9, 0.1), ".0fS") == "10(0.)"

There is some discussion in the comments in the source code about this issue. The comments allude that it is challenging, using the built in python formatting, to always get a trailing decimal.


I'm not sure what the resolution should be. Here are the options I see:

  • Do nothing. There will be an expected failing test. This seems like a bad idea
  • Change the test expected output to exclude the trailing decimal symbol and update the documentation to indicate that lack of a trailing decimal does not guarantee the uncertainty appears as "0"because it is equal to zero. Rather, it may appear as "0" because it was rounded to zero.
  • Change the source code so the test passes. I don't know how much effort this will be. I haven't checked through the source code lately and I haven't fully processed the linked comments. It probably won't be too bad to do this by doing a late check on the uncertainty string.

Probably short term the third option will be the path.


I'd like to comment on this as the author/maintainer of sciform, a package dedicated to scientific formatting of numbers. sciform tries to follow official SI/NIST/BIPM guidelines rather than existing programming conventions for formatting e.g. integers vs. floats. The first major decision this has lead to is that sciform only supports significant figure rounding, and not digits-past-the-decimal formatting. Since the number of sig figs on the uncertainty must be >= 1, this means it is impossible for a non-zero uncertainty to get rounded down to zero so this is issue is sidestepped. This is consistent with the guideline examples which all have non-zero uncertainty and always display the value and uncertainty out to the same least significant decimal place.

However, in practice users will pass in uncertainties that are exactly zero and, in my opinion, it is better to still format that as a value/uncertainty pair rather than just a value. In this case the uncertainty is never displayed with a hanging decimal symbol because there is no official example showing a trailing decimal symbol. I haven't found official guidance against this but there is 10.5.2 which guides against leading decimal points, i.e. "0.25" over ".25".

If a zero uncertainty is passed in then it is formatted as "0", "000", "0.000", "00.00" etc. depending on how the value is displayed and other options selected. It will always be rendered to the same least significant digit as the value controlling the number of zeros to the right of the decimal symbol, and if a certain option is selected it will be rendered to the same most significant digit as well, controlling the number of zeros to the left of the decimal symbol.

Due to popular demand, I will likely add a digits-past-the-decimal formatting style for value/uncertainty formatting in sciform. Though, for the reasons above, I will encourage the user not to use this option. In this mode, I will render "rounded-to-zero" uncertainty the same as "exactly-zero" uncertainty. I disagree with the convention that there should be an indication to the user whether the uncertainty was rounded to zero or is exactly zero. If the user chooses to use digits-past-the-decimal formatting then they expose themselves to selecting a most-significant displayed digit of the uncertainty that is greater the most-significant digit of the uncertainty and they opt-in to losing information about the uncertainty.

It seems to me that the uncertainties convention arose out of the programming convention that a trailing decimal indicates a float whereas no decimal symbol indicates an int. However, scientific formatting is entirely unconcerned with floats and ints. Scientific formatting has a value and uncertainty, both of which are real numbers. The main formatting decision is to round both the uncertainty and value to typically the decimal place matching the most- or second-most-significant digit of the uncertainty.

If uncertainties moves to using sciform as a formatting back-end then the trailing decimal symbol convention will be abandoned, essentially corresponding to the first option above of changing the test expected output on this particular test.

@newville
Copy link
Member

@jagerber48 A few comments.

I will not try to keep up with the length of your messages. When you write messages this long, or multiple replies to your own comments (as at #251), you should not assume that anyone is reading everything you write.

I agree that "integer 0 meaning no uncertainty, with float 0. meaning small uncertainty" is kind of weird. I think "precisely no uncertainty" would be better spelled None. That would need special handling, but also add clarity. I am OK with leaving it as it is ("0" or "0."). I am -1 on expecting to maintain the distinction between "0" and "0.".

I don't disagree with anything in sciform or the NIST/BIPM recommendations. But: those are about reporting in a publication. What this library prints out are intermediate results that can be turned into publishable results. These can certainly report more digits than are recommended, expecting the user to prepare for publication.

I would suggest that uncertainties focus more on calculations that propagate uncertainties (hard enough!), and less on formatting of values with uncertainties.

I am OK with making sciform an optional dependency and using that if available, or otherwise just print out with a "%.Ng" formatting (and not worry about leading or trailing zeros).

@jagerber48
Copy link
Contributor Author

@newville thanks for the response.

Sorry my messages are long. I have lots of thoughts and ideas and they're pretty "specific" in my mind so I try to be clear about them. But I can appreciate how the long verbose posts are not conducive to easy collaboration. I'll work to spend more time whittling down my message to be more concise and digestible so we can move forward more easily.

Regarding your comments on the trailing decimal symbol:

It's helpful to know you're -1 on maintaining the distinction between "0" and "0.". I'm also -1 on that. Dropping the distinction will solve this issue and could generally make the formatting algorithm more clear in code and documentation.

Your point is taken that maybe uncertainties need not provide functionality for publication ready formatting. That can be the job of something like sciform. Rather, uncertainties need only provide quick readability for the programmer. I'll take this into consideration

I would suggest that uncertainties focus more on calculations that propagate uncertainties (hard enough!), and less on formatting of values with uncertainties.

agreed.

I am OK with making sciform an optional dependency and using that if available, or otherwise just print out with a "%.Ng" formatting (and not worry about leading or trailing zeros).

This statement is a bit of a can of worms for me. There's a lot I'd like to discuss on it, but I'll pick that up at #192 as I have time. Maybe for now I'll express regret about bringing up the sciform topic in this issue and ask that the thread be re-focused on the specific question:

What should be done about "0" vs. "0." in the context of this specific tests but also generally for uncertainties formatting.

So far the vote is to stop worrying about the difference between 0 and 0., so in this case it would be rewrite the test so it passes and make sure the documentation is still sufficiently clear.

@newville
Copy link
Member

@jagerber48 For formatting, I guess I would say

a) uncertainties should not have to worry about following the NIST/BIPM recommendations, but try to do a decent job at providing at least enough digits.
b) using formats like "%.df", "%.de", or "%.de" are not perfect but ought to be okay for most things and kind of "normal Python". So, I think those are reasonable defaults for what uncertainties should do.
c) I think the choices of val(std), val+/-std, val±std, ${val}\pm{std}$, (val±std)×exp, and so forth is sort of a distraction in this code that would be better somewhere else (say, sciform).
d) I am +1 on adding sciform as an optional dependency and have uncertainties use that if available, and also recommending using it for real reporting even when not installed.

Yes, I would be +1 on dropping any distinction between "0" and "0." in the tests, docs, and code. I think that is too hard to maintain reliably, and does not really add much meaning anyway. If one wanted to reporting "the uncertainty is not only numerically vanishingly small, but precisely 0 because there is no uncertainty", then maybe std should be None. At least, in reporting ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants