Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting numbers #55

Closed
fcollonval opened this issue Sep 26, 2022 · 9 comments
Closed

Casting numbers #55

fcollonval opened this issue Sep 26, 2022 · 9 comments

Comments

@fcollonval
Copy link
Member

Could we add code comments for the reason to cast number float <-> int? Like at

cast_all(cell, float, int)

Is it possible to reduce the scope of conversion to reduce the load and avoid troubles? At first glance, that cast sounds like a source of hard-to-find errors in the future especially when applying it to cell serialization.

@welcome
Copy link

welcome bot commented Sep 26, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@davidbrochart
Copy link
Collaborator

IIRC these are workarounds for the limitation of Yjs to keep track of integers. I remember a discussion with @dmonad and @Waidhoferj, where we concluded that once an integer lands on the JS side, it comes back to Python as a float and there is no way to remember it was an integer.
I think we also concluded that we must only pass floating numbers to Yjs, but I don't remember the details.
Maybe this is not necessary anymore @dmonad @Waidhoferj?

@Waidhoferj
Copy link

Yjs operates with JavaScript types, which means that it doesn't make a distinction between float and int, casting each to a JS number. Since we want to maintain message compatibility between Yjs and Ypy, we decided to treat all integers as floats, since floats are similar to the representation of a JS number.

We talked about adding a wrapper type or marker to the protocol in order to distinguish between the types, but it is not actively being worked on.

@davidbrochart
Copy link
Collaborator

Thanks John.
I'll open a PR to add comments, and maybe do the casting more specifically.

@dmonad
Copy link

dmonad commented Sep 26, 2022

Just a bit more background:

It's not very unusual for message formats to generalize the concept of number. JSON encoding, for example, also doesn't distinguish floats from integers. Everything is simply a number. Similarly to JSON, we can't (and shouldn't) attach "type" information to numbers as many languages don't make a distinction between numbers. As David said, once a number passes through Yjs (or any other language that only has one "number" type) we can't make a distinction anymore.

Technically, we can't even represent all int numbers. We can only represent integers that can be encoded as a double (int53 + a few more). We could actually discuss whether we only want to return floats (double) in Ypy. If I remember correctly, there is some reason why we don't do that already.

@davidbrochart
Copy link
Collaborator

It is true that in the JavaScript world, not making a difference between an integer and a float is usually not a problem, because the language is quite permissive, e.g. for indexing:

a = [1, 2]
a[1.0]
// 2

But it is not the case with Python:

a = [1, 2]
a[1.0]
# TypeError: list indices must be integers or slices, not float

The notebook format says that execution_count must be an integer (or null). There is also nbformat and nbformat_minor, which must be integers. We could only cast those, but the issue is that metadata (in cells, outputs or notebook) is customizable, so we can't know the intended type of each field.
I think Ypy should make a difference between floats and integers. In the end, it is a binding for Python and it should take into account the specificities of the language.
What do you think @dmonad @Waidhoferj ?

@dmonad
Copy link

dmonad commented Sep 28, 2022

I'm not denying that this isn't a problem. However, there is no solution for it that makes everyone happy.

If we only had Ypy to think about. I'd definitely agree with you to distinguish between float and integer (as this is what python does).

However, JupyterLab still uses Yjs on the frontend. Once a client increments execution_count, we don't know anymore whether that number is a float or an integer.

// this might be a value set by python (integer)
const oldValue = ycell.get('execution_count')
// Once we increment the value in Javascript, we will set it as a double.
ycell.set('execution_count', oldValue + 1)

Since this value passes through JavaScript, we can't determine anymore whether we want to interpret this as an integer or a float. All numbers in JS are doubles. The same is true for a lot of other programming languages.

Now, we could argue whether we want to return the "closest interpretation". Obviously, the number is still a natural number, even though it's stored as a double. So we could simply return an integer in Ypy. But would the result of ymap.set('num', .1 + .9) also still be an integer? Dynamic casting seems like a problem to me.

Ultimately, this leads to confusion, and you still have to do type casting to get the expected results.

We can probably solve this problem for Jupyterlab and the execution_count indexing. In the larger scope of things, however, we want to have cross-compatible Y* implementations that enable applications written in different programming languages to communicate. Other languages support many more types of number: float, double, int, int53, int32, int64, complex, ... I'm not sure how we would do the number transformations between all languages.

My point is that there is a difference between the concept of "number" and how we store that number (float, double, int32, ..). The protocol (Y*) specifies that numbers must be doubles as it is the least common denominator that all languages support. It is up to the application-developer to interpret, cast, and use it.

Hence, I think the only course of action is to make double the only default number type. I don't see any other solution that doesn't exclude some programming languages. Unfortunately, you have to do number conversions, but that needs to happen anyway at some point. At least the result is deterministic, and you will always know what to expect when you receive a number.

@davidbrochart
Copy link
Collaborator

Thanks Kevin for the arguments, that makes sense.
I think that an acceptable solution is:

  • to cast all floating-point numbers coming from Yjs to integers if they "are integers", i.e. if they don't have any decimal, e.g. 1.0 will be cast to 1 in Python, but obviously 1.2 won't be cast. It is usually fine because in Python integers are "elevated" to floats when it makes sense, e.g. 1 / 2 == 0.5.
  • to cast all integers from Ypy to floating-point numbers. This is theoretically safe because integers are a subset of floating-point numbers, but we must keep in mind that size matters, e.g. a 64-bit float cannot represent the whole range of a 64-bit integer.

I remember that for the latter, if we don't cast to float, Ypy tries to pass the value to Yjs as a BigInt and it results in some issues, but I don't recall exactly.

@davidbrochart
Copy link
Collaborator

Closing as fixed by #57.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants