Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipleOf and floating point rounding errors #312

Closed
cederlys opened this issue Apr 26, 2017 · 15 comments
Closed

multipleOf and floating point rounding errors #312

cederlys opened this issue Apr 26, 2017 · 15 comments

Comments

@cederlys
Copy link

Is -15.9 a multiple of 5.3? The current specification of JSON schema is a bit terse:

A numeric instance is only valid if division by this keyword's value results in an integer.

What does this mean? In some programming languages, dividing a floating point number by another floating point number always results in a floating point number. In this case, it would be -3.0, which isn't an integer, so the validation would always fail.

python-jsonschema/jsonschema#185 is a bug report about this issue in a schema validator implementation. The conclusion is that "this is just floating points. Those numbers aren't exactly representable as floats, so you're going to get False, there's nothing jsonschema can do about it, the numbers you get are not in fact multiples of each other."

I think the specification needs to be clearer. Is this supposed to be useful for numbers like 5.3 and -15.9 which often cannot be represented exactly in floating point form? If so, the specification needs to be clear that implementations that use floating point needs to deal with rounding errors. In the current state, we get interoperability issues.

@Relequestual
Copy link
Member

What is the spec unclear about? How the library decides to do maths is up to it. I don't know if we have any specific tests for floating point values for multipleOf. @Julian, @epoberezkin ?

@cederlys
Copy link
Author

Since the spec doesn't explicitly say that multipleOf is expected to work even for numbers that cannot be represented exactly as floating point numbers, some implementors just give up and says, in essence, "you can't compare floating point numbers, because of rounding errors". This makes multipleOf an interop nightmare.

I see a few ways of handling this issue:

  • declare that multipleOf only works on integers (this would probably cause even more breakage, and is not something I recommend)
  • explicitly say that validators must deal with rounding issues, and give a few examples
  • explicitly say that the result when using floating point numbers may be unexpected unless the numbers can be represented exactly in whatever the implementation of the validator uses to store floating point numbers (I don't recommend this either)

@Julian
Copy link
Member

Julian commented Apr 26, 2017

@cederlys what do you mean by "deal with" rounding errors?

Something like your last option is the current state. But it's not JSON Schema specifically that made it, JSON does not mandate that languages parse into arbitrary precision, and many languages don't have easy access to such a thing.

It's true that that makes things less portable, but I'm not sure what motivation JSON Schema would have to be more strict there -- in cases where you control all the pieces, you have a choice on whether to use arbitrary precision, as I mentioned in that ticket, and when you don't, yeah you need to deal with the fact that your schema means different things depending on how someone deals with the resulting JSON.

@Relequestual
Copy link
Member

Maths is a fundamental issue between some languages. If you have an issue with a specific implementation of JSON Schema, the issue is with the implementation, I feel.

@cederlys
Copy link
Author

One way to deal with this issue is something like this (in pseudocode):

// Return true if dividends is a multiple of divisor.
// Actually, return true if it is almost a multiple of divisor,
// to account for rounding errors.
bool is_multiple_of(float dividend, float divisor)
{
    if (dividend == 0)
        return true; // avoid division by zero when computing scaled_diff
    float quotient = dividend / divisor;
    float rounded = round_to_nearest_integer(quotient);
    float scaled_diff = abs(dividend - divisor * rounded) / dividend;
    if (scaled_diff < epsilon)
        return true;
    else
        return false;
}

The value of epsilon depends on the floating point implementation. It should be choosen so that rounding errors don't case false failures, but it should be as small as possible to avoid false positives.

I think it would be helpful if JSON Schema explicitly states if implementations are supposed to go to this trouble, or if using floating point numbers is expected to be non-portable.

@Julian
Copy link
Member

Julian commented Apr 26, 2017

That kind of thing can never work -- see the response in the bug ticket you linked, although I was quite terse there unfortunately.

How are you going to distinguish what you call "rounding errors" from the actual literal float that is not the "rounded" one you're talking about?

Are you proposing that JSON Schema mandate some level of imprecision that is different from the float specification's own? If so, can you elaborate on why that'd be a thing that's in JSON Schema's purview to want to do?

@cederlys
Copy link
Author

I'm not saying that JSON Schema should require implementations to do like that. I'd be just as happy if the spec had a footnote that said something like this:

Implementations of JSON Schema validators may store numbers in any form they like. If they use a binary floating point format, it may not be possible to store an exact representation of numbers such as 0.01. This means that for instance 0.99 may not be a multiple of 0.01. In practice, multipleOf works well for small to mid-size integers, and fractions that can be exactly represented in binary form (such as 0.5 and 0.25), but may produce surprising results for other numbers.

Perhaps this should be mentioned in the JSON specification, but the issue isn't as important there, as the JSON format itself doesn't do any math. It says nothing about how a number should be stored by an application. In the JSON specification, a number is just a sequence of characters that adheres to a particular grammar. But in JSON Schema validators have to actually do math with the numbers when multipleOf is used. Because of that, I think it is up to JSON Schema to either define, or explicitly leave it undefined, how that math is performed.

I may be wrong, but I have not found anything that requires an implementation to use binary floating point internally. If an implementation were to use floating point operations on decimal numbers it wouldn't have this issue. But that is probably not something that should be required.

@Julian
Copy link
Member

Julian commented Apr 27, 2017

Ah, yeah, a note certainly makes sense to me.

Reminding people next to multipleOf that its use with non-integer numbers may not be portable and will often involve floating point error depending on the host language's parsing behavior sounds like a reasonable idea.

The upcoming (sidebar: @handrews is this upcoming or released, I can't tell, the website claims draft 5 is current) draft 6 doesn't appear to have much difference in explaining multipleOf from how I remember it, but it seems reasonable to me to add something like that note going forward if someone can come up with a decent terse wording.

@cederlys
Copy link
Author

Maybe something like this? I've borrowed heavily from RFC 7159, chapter 6, but tried to adapt it for the current context:

This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by JSON Schemas that expect no more precision or range than these provide. A schema such as {"type": "number", "multipleOf": 0.01} may be problematic, since 0.01 cannot be represented exactly in many binary floating point implementations; in some implementations 0.49 may not be accepted as a multiple of 0.01.

Note that when such software is used, numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on whether one integer is a multiple of the other.

Unless it is already present, the IEEE754 reference must also be added as an informative reference:

[IEEE754]  IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE
          Standard 754, August 2008,
          <http://grouper.ieee.org/groups/754/>.

(I have not checked if that standard has been updated after its inclusion in RFC 7159.)

@awwright
Copy link
Member

awwright commented Apr 29, 2017

Since JSON already talks about how to parse its arbitrary-precision numbers as IEEE floats, and since JSON is normatively referenced (making it a part of the spec in a sense), I don't think any additional text is actually warranted.

If implementations want to use IEEE floats, they're very much allowed to, and IEEE already treats how to do number comparisons using an acceptable-margin-of-error technique. Do we need to describe that again?

Also not that the precision of an IEEE float is proportional to its magnitude, so even if multipleOf had to be a float, that would only work up to some (very large, but finite) number.

@cederlys
Copy link
Author

I don't have access to the IEEE standard. But if IEEE treats how do compare numbers using an acceptable-margin-of-error technique -- does that not imply that a JSON Schema implementation that uses IEEE should consider 0.49 to be a multipleOf 0.01? Is that what you meant, @awwright? And yet, @Julian seems to be of the opposite view: when floating point is used, you should expect unexpected results, and 0.49 may not be a multiple of 0.01.

I think either view is valid. But they cannot both be valid at once. I think the JSON Schema needs to explicitly state what we (as schema writers and users) can expect of a validator.

I found a very good article about comparing floating point numbers: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

If the method suggested in that article is used to compare round(x/y) to x/y, it should produce sensible results.

But is that something that JSON Schema should require of validators?

@Julian
Copy link
Member

Julian commented May 11, 2017 via email

@cederlys
Copy link
Author

I have two issues with "expect the behavior defined by the float spec":

  • As far as I know, the IEEE spec does not define a multipleOf operation. That operation is only defined in the JSON Schema specification, and it is not done by operations from the IEEE spec.

  • JSON does not require that an implementation use binary floating point. It could use decimal floating point (as COBOL does), it could use rational numbers (as perl 6 does). See http://blogs.perl.org/users/ovid/2015/02/a-little-thing-to-love-about-perl-6-and-cobol.html

@handrews
Copy link
Contributor

I think that any attempt to control the interpretation of numbers beyond what is specified in the JSON RFC (and standards that it references such as IEEE floats) should be done by defining values for format.

A format could be applied to numbers if the desire is simply to convey semantics (use decimal floating point vs use IEEE floating point). If the intention is to preserve some aspect of the numeric representation, because of the data model this is better done by defining a string format that indicates how the string should be interpreted as a number. This is because strings map fairly directly into the data model (particularly for things like basic numeric notation that do not require escaped characters), while numbers intentionally lose representation details during parsing.

See PR #455 (numeric representation and the data model), and issues json-schema-org/json-schema-vocabularies#45 (encoding decimals as strings), #152 (specifying precision), and #116 (format maximum/minimum, also discusses multipleOf for format) for related discussions.

Is there anything to be done for this issue that is not addressed by the other issues and PRs? If there are no comments indicating a course of action here after a couple of weeks I will close this in favor of the other issues.

I do not think that the JSON Schema core specification should mandate specific floating point behavior any more than JSON does.

@handrews
Copy link
Contributor

It's been more than two years since I asked if there was anything not covered by the linked issues/prs, so I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants