-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vendor+sql: update apd, change str representation of DECIMAL #17029
Conversation
@jordanlewis plz confirm this is unlikely to cause orm breakage |
code LGTM |
Hmm, this patch makes me nervous, even though I agree with all of your rationale for making this the new default. This is exactly the kind of change that strikes me as very likely to break clients. Even though the new output format is valid as an input format for CockroachDB and Postgres, I think that there are almost certainly clients that expect the output format to be the same as Postgres's, and this will upset them. What about the pgwire binary format? Did that change, or is that the same as before? At the bare minimum we need to add tests for this to the various client tests (java_test, c_test etc). |
I'm particularly worried about cases where people use decimals with things that are always just (potentially large) integers. Whatever is consuming those results might not like Review status: 0 of 13 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
I've added a test for I am sympathetic to these possible problems, but I think we should move forward just like we have with the other differences we have to postgres. If it causes too many client issues we can change it back or add a session setting. Review status: 0 of 14 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
haha, you got me there. You're right. I checked with both MySQL and Postgres and both allow the scientific notation for decimal/numeric. Let's put this through, and see if users complain. LGTM |
Wait, I am still uncomfortable. I think it is a mistake to land this in 1.1 without more significant testing on our side. Thanks for adding the Java test. We need to check the C libpq, js Ruby and python too. Sorry for the high burden of proof but I strongly believe the onus falls on us to validate this, not our users. |
How would you suggest I do the C version? Is there a standard bigdecimal library to use? There's no example code in c_test.go to use as a template like there was for java. |
apd has had various improvements and API changes: - ToStandard is now Text('f') - ToIntegral renamed - Modf can now take nil, easing the implementation of trunc Additionally, change our default decimal -> string implementation to use the scientific notation (use exponents when needed) instead of the so-called standard (a name I made up) representation. The 'standard' representation was used to match Postgres. It simply never prints exponents. This causes two problems: large exponents are converted into lots of zeros, and it is no longer possible to extract the actual precision from the number if there are zeros appended to the right, because it's not clear if those were in the original value or were appended during the string conversion. The change to scientific notation fixes both of these problems, allowing for smaller numbers to be printed and allowing users to always know exactly how many zeros were in their values. There is a possibility that this change causes problems to users of decimals if their string -> decimal parser doesn't support exponents. I think we should allow users to report that problem and attempt to deal with it if it comes up. Mostly I think it'll be just fine, because pretty much everything that deals with decimals should know about exponents. This change allows for the exception in copy_in_test.go to be removed, since trailing zeros are handled correctly now. This also allows us to easily in the future allow session settings to control decimal formatting, since all decimal formatting operations now go through the Go fmt.Formatter interface, and thus accept the standard e, f, and g verbs.
I've added a nice python test. I also added a less nice ruby test. The ruby test is totally boring because, by default, the pg package just returns everything as a string. You have to manually instruct it to use a type decoder. If we do that, then it barfs on the OID of numeric (so, nothing to do with this change). Trivial to get it working by assigning the decoder to be something that can parse those strings, so I don't think it's a useful test. A useful test would be to hook up a full rails thing, which might have these decoders defined, and see what that does. (If someone wants that, I'll need help doing it since it's out of my familiarity.) Adding a C test is similar to the ruby one (i.e., not useful): you just have to choose a decimal parsing library that knows how to parse exponents, there's no built in thing (see https://stackoverflow.com/questions/32651069/c-libpq-get-float-value-from-numeric). |
For C the most common is GMP/MPFR can you check (eg via their docs) whether they support the scientific syntax?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Thanks for adding the tests - I'm starting to feel better about this change. I have a few more questions. I think that it might be user unfriendly to always print decimals in scientific notation even in the "normal" case of a number with no trailing zeros. Thoughts on that? Maybe we could output an ordinary value in the "normal" case? As a user, I think I would be confused and maybe even annoyed to see Also, can you elaborate on the problem with ambiguity around trailing zeroes? I don't understand where string conversion would introduce that ambiguity. |
The exponents are not always printed, only when needed. See https://github.com/cockroachdb/apd/blob/master/testdata/base.decTest for the test cases. Also, To illustrate the problem, take the numbers you had above: It's like saying: "I'm 1 mile away" and "I'm 1.00000" miles away. If users provide that precision, it should be kept and correctly displayed. |
GMP supports exponents (https://gmplib.org/manual/Assigning-Floats.html#Assigning-Floats) in their mpf_set_str function. |
thanks. |
Got it - thanks for the tutorial. LGTM in that case. I wonder if it's worth introducing a session setting for this kind of thing at some point down the line, to permit customizable output. |
Yes, I so want decimal session settings. To control: formatting, rounding, context precision during arithmetic operations. Seems like that + distsql would make cockroach a really nice platform to do distributed scientific computing or something. |
@jess-edwards No docs need to be updated, since they don't list the text output format. |
apd has had various improvements and API changes:
Additionally, change our default decimal -> string implementation to
use the scientific notation (use exponents when needed) instead of the
so-called standard (a name I made up) representation. The 'standard'
representation was used to match Postgres. It simply never prints
exponents. This causes two problems: large exponents are converted
into lots of zeros, and it is no longer possible to extract the
actual precision from the number if there are zeros appended to the
right, because it's not clear if those were in the original value or
were appended during the string conversion. The change to scientific
notation fixes both of these problems, allowing for smaller numbers
to be printed and allowing users to always know exactly how many
zeros were in their values.
There is a possibility that this change causes problems to users of
decimals if their string -> decimal parser doesn't support exponents. I
think we should allow users to report that problem and attempt to
deal with it if it comes up. Mostly I think it'll be just fine,
because pretty much everything that deals with decimals should know
about exponents.
This change allows for the exception in copy_in_test.go to be removed,
since trailing zeros are handled correctly now.
This also allows us to easily in the future allow session settings to
control decimal formatting, since all decimal formatting operations
now go through the Go fmt.Formatter interface, and thus accept the
standard e, f, and g verbs.