Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reverse method on Range of Chars gives unexpected results #12473

Open
luigip opened this issue Oct 13, 2021 · 11 comments
Open

reverse method on Range of Chars gives unexpected results #12473

luigip opened this issue Oct 13, 2021 · 11 comments

Comments

@luigip
Copy link

luigip commented Oct 13, 2021

reproduction steps

using Scala 3,

scala> (1 to 26).reverse
val res0: Range = Range 26 to 1 by -1

scala> ('a' to 'z').reverse
val res1: scala.collection.immutable.NumericRange[Char] = empty NumericRange z to a by �

problem

Seems like all other Ranges give you a collection back of the same size when you reverse them, but a Range of Chars returns an empty Range. I understand you can get around this by turning it into a Seq (with memory usage implications) or using reverseIterator, but the above behaviour was surprising and caused a bug in my code.

Similarly it would be nice if you could make a reversed range with ('z' to 'a' by -1) or something to that effect. (Interestingly, this won't compile with a negative number, saying it needs to be a Char, but does with a positive number... 🤷)

@SethTisue
Copy link
Member

this is the Scala 2 bug tracker, but the same behavior does occur in Scala 2

@scala/collections ?

@luigip
Copy link
Author

luigip commented Oct 13, 2021

Just to add: toSeq doesn't actually fix it; you have to use toList etc

scala> ('a' to 'z').toSeq.reverse
val res2: scala.collection.immutable.NumericRange[Char] = empty NumericRange z to a by �

scala> ('a' to 'z').toList.reverse
val res3: List[Char] = List(z, y, x, w, v, u, t, s, r, q, p, o, n, m, l, k, j, i, h, g, f, e, d, c, b, a)

@SethTisue
Copy link
Member

note that 2.12.15 doesn't have this problem

@NthPortal
Copy link

NthPortal commented Oct 14, 2021

I am torn, because on the one hand, Chars are not numbers (I don't care what they're backed by), and on the other hand, it's quite convenient to be able to have a range of them.

anyway, I believe the problem is that Chars are, to the extent that they're numbers at all, unsigned. thus,

scala> ('a' to 'z').step.toInt
val res4: Int = 1

scala> ('a' to 'z').reverse.step.toInt
val res5: Int = 65535

Interestingly, this won't compile with a negative number, saying it needs to be a Char, but does with a positive number... 🤷

see above about being unsigned

@som-snytt
Copy link

som-snytt commented Oct 14, 2021

@NthPortal char is a numeric value type https://scala-lang.org/files/archive/spec/2.13/12-the-scala-standard-library.html#numeric-value-types

just replying to whether it's a number. Signedness is a nice catch.

@NthPortal
Copy link

NthPortal commented Oct 15, 2021

this is actually a fundamental problem with NumericRange; it doesn't work for unsigned numeric types. one could theoretically write their own u32 type, and it would not be possible to create a reverse range for that type (which is very much numeric) either. I believe it would require a fundamental change such as having a boolean to track if the range is reversed.


we might be able to fix the problem for Char to Char by adding a custom (and private) subtype of NumericRange specifically for Char, and have .reverse do some magic. I don't know if that would be feasible while maintaining bincompat, but my instinct says it is.


char is a numeric value type

Char is numeric in that it represents a UTF-16 code point (which is a number), but numeric operations don't actually make any sense at all for Char. 'a' + 'z' is nonsense. 'a' * 'z' is nonsense. 'a' to 'z' by '<START OF HEADING>' is nonsense, but is what 'a' to 'z' actually means (but you'd never know if I swapped '<START OF HEADING>' for the wrong character). 'a' to 'z' happens to get you the result you want, but adding code points is a nonsensical operation. fundamentally, I think supporting 'a' to 'z' is wrong; it works for that alphabetical range, but anything outside the English alphabet (or even mixing uppercase and lowercase) is meaningless and arbitrary. For example, what does '0' to 'Q' mean?

if you want a range of codepoints, (Int to Int).reverse.view.map(_.toChar) will be clearer, and is also incidentally a workaround to the problem (e.g. ('a'.toInt to 'z'.toInt).reverse.view.map(_.toChar))

@NthPortal
Copy link

NthPortal commented Oct 15, 2021

note that 2.12.15 doesn't have this problem

to clarify, in 2.12 ('a' to 'z').reverse returns a Vector. I've not yet gone digging to find out why.

Edit: in 2.13, NumericRange#reverse returns a NumericRange, so we cannot have it do what 2.12 does by returning a Vector regardless.

@dwijnand dwijnand added this to the Backlog milestone Nov 18, 2021
@NthPortal
Copy link

I kind of want to override .reverse on NumericRange[Char] to throw an exception with the message "Sadness. You have chosen to treat a character as a number, and now look what has happened."

@luigip
Copy link
Author

luigip commented Dec 5, 2021

Just from a user's POV, I'm not aware - and I see no suggestion in the Scaladoc - that a Range of Chars being treated as a NumericRange behind the scenes... conceptually I want it to be treated as a hypothetical AlphabeticRange. So perhaps the exception should be a bit more apologetic - "Dang, we redesigned the collections and thought we could treat Chars like any other number, but sadly, it seems we can't. BOOM. Sorry. Please revert to Scala 2.12."

Range, as I see it, would ideally be a type of lightweight collection that will work with any type that offers an implementation - wouldn't it be nice if we could have a Range of Foos that works as you might expect it to, e.g. Foo(10) to Foo(20) by 2, or even Earth to Neptune assuming these are part of an ordered enum of planets. It would have a type parameter as all other collection types do, and the current concept would just apply to the special case of Range[Int]. But then, this is somewhat at odds with how Range is described, as "The Range class represents integer values in range [start;end) with non-zero step value step." - so perhaps we'd need to invent a new class with a different name.

@som-snytt
Copy link

Range of Double also did not succeed. I don't think this conceptual issue due to collections, but just about the step.

I agree that range of chars is natural, in the sense that toLower is 'a' + c - 'A' . But as mentioned, code points are ints.

I remember Odersky saying NumericRange is an over-abstraction, but as noted, anything countable could have a range. Maybe Scala 3 enables syntax for GroundControl to MajorTom etc. Someone recently asked somewhere for .. syntax. Oh, here is the enum range: lampepfl/dotty-feature-requests#145

@SethTisue
Copy link
Member

SethTisue commented Dec 6, 2021

(I'm repeating after @som-snytt here, but I think it bears repeating:)

Char being a numeric type (a slightly odd one, but nonetheless solidly a numeric type) is a C and Java tradition carried on in Scala. It is arguably an unfortunate tradition, and it's definitely surprising when first encountered, but for a Range of Chars to be a NumericRange is the product of decades of tradition.

references:

this leads to working-as-specified behaviors such as:

scala> 'a' + 1
val res1: Int = 98

scala> ('a' + 1).toChar
val res2: Char = b

scala> 'a' + 'b'
val res3: Int = 195

scala> 'a' * 2
val res4: Int = 194

scala> 'a' * 'b'
val res5: Int = 9506

and any argument for proposed changes to the behavior Ranges of Charss needs to be considered in this context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants