Casting numbers #55

fcollonval · 2022-09-26T13:18:16Z

Could we add code comments for the reason to cast number float <-> int? Like at

Line 89 in 043163b

cast_all(cell, float, int)

Is it possible to reduce the scope of conversion to reduce the load and avoid troubles? At first glance, that cast sounds like a source of hard-to-find errors in the future especially when applying it to cell serialization.

The text was updated successfully, but these errors were encountered:

welcome · 2022-09-26T13:18:18Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

davidbrochart · 2022-09-26T13:44:48Z

IIRC these are workarounds for the limitation of Yjs to keep track of integers. I remember a discussion with @dmonad and @Waidhoferj, where we concluded that once an integer lands on the JS side, it comes back to Python as a float and there is no way to remember it was an integer.
I think we also concluded that we must only pass floating numbers to Yjs, but I don't remember the details.
Maybe this is not necessary anymore @dmonad @Waidhoferj?

Waidhoferj · 2022-09-26T14:15:10Z

Yjs operates with JavaScript types, which means that it doesn't make a distinction between float and int, casting each to a JS number. Since we want to maintain message compatibility between Yjs and Ypy, we decided to treat all integers as floats, since floats are similar to the representation of a JS number.

We talked about adding a wrapper type or marker to the protocol in order to distinguish between the types, but it is not actively being worked on.

davidbrochart · 2022-09-26T14:20:22Z

Thanks John.
I'll open a PR to add comments, and maybe do the casting more specifically.

dmonad · 2022-09-26T16:31:55Z

Just a bit more background:

It's not very unusual for message formats to generalize the concept of number. JSON encoding, for example, also doesn't distinguish floats from integers. Everything is simply a number. Similarly to JSON, we can't (and shouldn't) attach "type" information to numbers as many languages don't make a distinction between numbers. As David said, once a number passes through Yjs (or any other language that only has one "number" type) we can't make a distinction anymore.

Technically, we can't even represent all int numbers. We can only represent integers that can be encoded as a double (int53 + a few more). We could actually discuss whether we only want to return floats (double) in Ypy. If I remember correctly, there is some reason why we don't do that already.

davidbrochart · 2022-09-28T07:42:17Z

It is true that in the JavaScript world, not making a difference between an integer and a float is usually not a problem, because the language is quite permissive, e.g. for indexing:

a = [1, 2]
a[1.0]
// 2

But it is not the case with Python:

a = [1, 2]
a[1.0]
# TypeError: list indices must be integers or slices, not float

The notebook format says that execution_count must be an integer (or null). There is also nbformat and nbformat_minor, which must be integers. We could only cast those, but the issue is that metadata (in cells, outputs or notebook) is customizable, so we can't know the intended type of each field.
I think Ypy should make a difference between floats and integers. In the end, it is a binding for Python and it should take into account the specificities of the language.
What do you think @dmonad @Waidhoferj ?

dmonad · 2022-09-28T09:52:32Z

I'm not denying that this isn't a problem. However, there is no solution for it that makes everyone happy.

If we only had Ypy to think about. I'd definitely agree with you to distinguish between float and integer (as this is what python does).

However, JupyterLab still uses Yjs on the frontend. Once a client increments execution_count, we don't know anymore whether that number is a float or an integer.

// this might be a value set by python (integer)
const oldValue = ycell.get('execution_count')
// Once we increment the value in Javascript, we will set it as a double.
ycell.set('execution_count', oldValue + 1)

Since this value passes through JavaScript, we can't determine anymore whether we want to interpret this as an integer or a float. All numbers in JS are doubles. The same is true for a lot of other programming languages.

Now, we could argue whether we want to return the "closest interpretation". Obviously, the number is still a natural number, even though it's stored as a double. So we could simply return an integer in Ypy. But would the result of ymap.set('num', .1 + .9) also still be an integer? Dynamic casting seems like a problem to me.

Ultimately, this leads to confusion, and you still have to do type casting to get the expected results.

We can probably solve this problem for Jupyterlab and the execution_count indexing. In the larger scope of things, however, we want to have cross-compatible Y* implementations that enable applications written in different programming languages to communicate. Other languages support many more types of number: float, double, int, int53, int32, int64, complex, ... I'm not sure how we would do the number transformations between all languages.

My point is that there is a difference between the concept of "number" and how we store that number (float, double, int32, ..). The protocol (Y*) specifies that numbers must be doubles as it is the least common denominator that all languages support. It is up to the application-developer to interpret, cast, and use it.

Hence, I think the only course of action is to make double the only default number type. I don't see any other solution that doesn't exclude some programming languages. Unfortunately, you have to do number conversions, but that needs to happen anyway at some point. At least the result is deterministic, and you will always know what to expect when you receive a number.

davidbrochart · 2022-09-28T13:28:37Z

Thanks Kevin for the arguments, that makes sense.
I think that an acceptable solution is:

to cast all floating-point numbers coming from Yjs to integers if they "are integers", i.e. if they don't have any decimal, e.g. 1.0 will be cast to 1 in Python, but obviously 1.2 won't be cast. It is usually fine because in Python integers are "elevated" to floats when it makes sense, e.g. 1 / 2 == 0.5.
to cast all integers from Ypy to floating-point numbers. This is theoretically safe because integers are a subset of floating-point numbers, but we must keep in mind that size matters, e.g. a 64-bit float cannot represent the whole range of a 64-bit integer.

I remember that for the latter, if we don't cast to float, Ypy tries to pass the value to Yjs as a BigInt and it results in some issues, but I don't recall exactly.

davidbrochart · 2022-09-28T13:42:55Z

Closing as fixed by #57.

davidbrochart mentioned this issue Sep 28, 2022

Cast only if number has same value #57

Merged

davidbrochart closed this as completed Sep 28, 2022

davidbrochart mentioned this issue Oct 4, 2022

Test a case with plotly renderer #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Casting numbers #55

Casting numbers #55

fcollonval commented Sep 26, 2022

welcome bot commented Sep 26, 2022

davidbrochart commented Sep 26, 2022

Waidhoferj commented Sep 26, 2022

davidbrochart commented Sep 26, 2022

dmonad commented Sep 26, 2022 •

edited

Loading

davidbrochart commented Sep 28, 2022

dmonad commented Sep 28, 2022 •

edited

Loading

davidbrochart commented Sep 28, 2022

davidbrochart commented Sep 28, 2022

Casting numbers #55

Casting numbers #55

Comments

fcollonval commented Sep 26, 2022

welcome bot commented Sep 26, 2022

davidbrochart commented Sep 26, 2022

Waidhoferj commented Sep 26, 2022

davidbrochart commented Sep 26, 2022

dmonad commented Sep 26, 2022 • edited Loading

davidbrochart commented Sep 28, 2022

dmonad commented Sep 28, 2022 • edited Loading

davidbrochart commented Sep 28, 2022

davidbrochart commented Sep 28, 2022

dmonad commented Sep 26, 2022 •

edited

Loading

dmonad commented Sep 28, 2022 •

edited

Loading