Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline rendering of 3d array confusing; proposed syntax for displaying/initializing 3d array #30467

Closed
BioTurboNick opened this issue Dec 20, 2018 · 17 comments
Labels
arrays [a, r, r, a, y, s] display and printing Aesthetics and correctness of printed representations of objects.

Comments

@BioTurboNick
Copy link
Contributor

julia> a = [[130.5, 154.25, 173.0], [141.333 159.667 164.333], cat(152.583, 152.583, dims = 3)]

3-element Array{Array{Float64,N} where N,1}:
 [130.5, 154.25, 173.0]
 [141.333 159.667 164.333]
 [152.583]

[152.583]

The third entry of the array, a 3d array with 2 elements across the 3rd dimension is displayed with an extra line break and no indent.

I propose that arrays beyond 2 dimensions may be displayed as, and specified by, multiple semicolons.

E.g. the above output would instead look like:

 [130.5, 154.25, 173.0]
 [141.333 159.667 164.333]
 [152.583;; 152.583]

And a 3d array could be compactly specified by e.g. [1 2 3; 4 5 6; 7 8 9;; 0 9 8; 7 6 5; 4 3 2]

Each additional dimension would use another semicolon. Normally, extra semicolons are just ignored.

@JeffBezanson
Copy link
Member

Not a bad idea. See #7128 for some past discussion on this, where I think multi-semicolon was also proposed at one point.

@JeffBezanson JeffBezanson added arrays [a, r, r, a, y, s] display and printing Aesthetics and correctness of printed representations of objects. labels Jan 2, 2019
@BioTurboNick
Copy link
Contributor Author

Where in the codebase would something like this go, if I wanted to play around with implementing it?

@JeffBezanson
Copy link
Member

julia-parser.scm (the parser) would be the place to start. Unfortunately this is breaking, since multiple semicolons are accepted now. However I imagine that must occur in code very rarely.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Oct 17, 2019

Tried to take a stab but there are a couple critical parts I don't quite get yet.

I think I can tell that parse-matrix in julia_parser.scm is building up an array of values from the text input by row and then adding an instruction to the output list pointing to a function in julia_syntax.scm which accepts the parsed values.

It would help me to know what's going on in the line (expand-forms ``(call (top hcat) ,@(cdr e)))) in julia_syntax.scm for 'hcat? Is this generating a call to the Julia hcat function?

If that's the case, I suppose I would need to know how to write a call to the cat function with its keyword dims argument?

Since it appears "rewinding" the input stream isn't done anywhere, I suppose I would just build an array containing the information I need like into an intermediate form like: Any[:(1 2), :(4 5), 2, :(5 3), :(4 5)], (where the single value 2 is the semicolon count) and the julia_syntax.scm function I'd also write would recurssively convert that into e.g. cat(hvcat(), hvcat(), dims = 3).

Aside from the pieces I'm missing, does this sound like the right approach?

@BioTurboNick
Copy link
Contributor Author

Oh, the latter wouldn't work obviously because [3;4;5] would produce Any[3,1,4,1,5]. I suppose maybe I'd need a new object to stick in, like ;(2). I see that :(1 2) is constructed by fix 'row v, but I'm not sure where that conversion is happening...

Maybe I'll tinker more later, probably shouldn't sink much time into this now :-)

@BioTurboNick
Copy link
Contributor Author

Ah!

`(call (top _cat) ,1 ,4 ,4) ===> _cat(1, 4, 4) ===> [4; 4]

where the first argument is the dims argument. Idk if it's okay to call a "private" function like that, but I figure I'll just make it work first.

@mcabbott
Copy link
Contributor

mcabbott commented Aug 23, 2020

Has printing 3D arrays using cat been considered somewhere?

Instead of [1 2 3; 4 5 6; 7 8 9;; 0 9 8; 7 6 5; 4 3 2] just print cat([1 2 3; 4 5 6; 7 8 9], [0 9 8; 7 6 5; 4 3 2], dims=3). This has the advantage that it's harder to miss, and easier to look up in the help than is syntax. I guess the syntax could avoid some allocations, but I imagine this is more for constructing small arrays for tests, rather than anything which has to be fast.

Perhaps cat should also default to using the N+1th dimension, if no dims keyword is given, which could make this slightly less verbose.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Aug 23, 2020

@mcabbott - I can't speak to if it has been considered, but that would be a reasonable advance from the status quo IMO. Though I also think nested calls for higher dimensions becomes cumbersome quickly, if you want to quickly type it on the REPL.

Something like the below, from one of my tests for SimpleANOVA.jl is what I'd like to find easier to read and enter:

observations2 = cat(cat([[1.9, 1.8, 1.6, 1.4] [1.8, 1.7, 1.4, 1.5]],
                        [[2.3, 2.1, 2.0, 2.6] [2.4, 2.7, 2.4, 2.6]],
                        [[2.9, 2.8, 3.4, 3.2] [3.0, 3.1, 3.0, 2.7]], dims = 3),
                    cat([[2.1, 2.0, 1.8, 2.2] [2.3, 2.0, 1.9, 1.7]],
                        [[2.4, 2.6, 2.7, 2.3] [2.0, 2.3, 2.1, 2.4]],
                        [[3.6, 3.1, 3.4, 3.2] [3.1, 3.0, 2.8, 3.2]], dims = 3),
                    cat([[1.1, 1.2, 1.0, 1.4] [1.4, 1.0, 1.3, 1.2]],
                        [[2.0, 2.1, 1.9, 2.2] [2.4, 2.6, 2.3, 2.2]],
                        [[2.9, 2.8, 3.0, 3.1] [3.2, 2.9, 2.8, 2.9]], dims = 3), dims = 4)

An advantage of a syntax approach, I feel, is that there's less stuff to remove if you want to make it a nicely formatted array in text. Just hit enter in the right spots.

Your suggestion on automatically inferring dims would make it easier to type by omitting the argument, but also makes it harder to read perhaps?

For the moment, what I've coded for the ;; syntax just maps directly to cat(), so no improvement. I see now that it's possible to do with a single allocation. But you're probably right that the main usage would be testing and development, so that isn't the most important thing.

@mcabbott
Copy link
Contributor

mcabbott commented Aug 23, 2020

Re nice formatting, one use of this is to be able to copy what rand(Int8, 3,4,2) prints at the REPL back into code easily, as you can for matrices. I take it that this would work under your proposal?

A = Int8[ -128   30  -76   58
   98  -71   20  -36
  122  119   56  -38
;;
  78  -49  -63  -111
  34  -44  118   -73
 -10  -41   81    26 ]

With cat you have to type a little more:

A = cat(Int8[ 
 -128   30  -76   58
   98  -71   20  -36
  122  119   56  -38
], Int8[
  78  -49  -63  -111
  34  -44  118   -73
 -10  -41   81    26 ], dims=3)

Conversely, when something prints this out (your example) it's not so easy to spot that it's a 4-array:

"[1.9 1.8; 1.8 1.7; 1.6 1.4; 1.4 1.5;; 2.3 2.4; 2.1 2.7; 2.0 2.4; 2.6 2.6;; 2.9 3.0; 2.8 3.1; 3.4 3.0; 3.2 2.7;;; 2.1 2.3; 2.0 2.0; 1.8 1.9; 2.2 1.7;; 2.4 2.0; 2.6 2.3; 2.7 2.1; 2.3 2.4;; 3.6 3.1; 3.1 3.0; 3.4 2.8; 3.2 3.2;;; 1.1 1.4; 1.2 1.0; 1.0 1.3; 1.4 1.2;; 2.0 2.4; 2.1 2.6; 1.9 2.3; 2.2 2.2;; 2.9 3.2; 2.8 2.9; 3.0 2.8; 3.1 2.9]"

whereas a long string starting with "cat(cat([1.9 1.8; 1.8 and ending with 2.8; 3.1 2.9], dims=3), dims=4)" might be more obvious? Perhaps, even if cat inferred the dims, they should still be printed for clarity.

@mcabbott
Copy link
Contributor

Another possible printing to consider is "reshape([1.9, 1.8, 1.6, 1.4, ... 2.9, 2.8, 2.9], (4, 2, 3, 3))".

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Aug 24, 2020

Yes, the parser in my PR also handles typed array blocks.

Re: "reshaped", that's not bad, though I assume they'd want reshape to be reserved for the ReshapedArray type.

In my PR, an array of 3d arrays would print like this:

julia> [[1;;4],[1;;3],[1;;2]]
3-element Vector{Array{Int64,3}}:
 [1;; 4]
 [1;; 3]
 [1;; 2]

Here, the type is clear even if it was so long you couldn't see the semicolons. But if the array had mixed array types, then there could be an issue:

julia> [[1;;4],[1;3],[1 2]]
3-element Vector{Array{Int64,N} where N}:
 [1;; 4]
 [1, 3]
 [1 2]

If the array is long enough to hide the semicolons, I take your point. Maybe instead it should show the specific element type if the array is a more general type? Something like:

3-element Vector{Array{Int64,N} where N}:
 Array{Int64, 3}: [1;; 4]
 Vector{Int64}: [1, 3]
 Array{Int64, 2}: [1 2]

It's not as clean in most cases, but it is clear.

Could also modify the display code to ensure the major separators are displayed. In testing this I just noticed my PR doesn't handle long arrays well right now, splitting across multiple lines, but it could look something like this to emphasize the largest dimension, similar to how a 2d array is handled:

[1 2 … 43 25; 3 33 … 24 4] # current 2-d array inline
[1; 3; … 4; 5;;; 5; 3; … 4; 9] # possible 4-d array inline

@BioTurboNick
Copy link
Contributor Author

Worth saying that I'm not suggesting to change the full n-d array printing mode, which is like this:

1×1×11×3 Array{Int64,4}:
[:, :, 1, 1] =
 1

[:, :, 2, 1] =
 4

[:, :, 3, 1] =
 4

@mcabbott
Copy link
Contributor

Re Array{Int64, 3}: [1;; 4], perhaps that would be afterwards, for this is to parse. But then would examples like this print the whole type? Or the Array{T,N} which is what pasting this would give you?

julia> view(PermutedDimsArray(rand(Int8, 3,4,5), (3,2,1)),1,:,1:3) |> string
"Int8[82 -102 -12; -59 -77 66; -35 127 81; 88 -124 73]"

reshape of an Array does not return a ReshapedArray, although if it did start doing so, that would be a step stranger. The compact printing right now is == and the same eltype, no more, but still.

I guess the precedent for printing constructor functions is this:

julia> zip(1:26, 'a':'z') |> string
"zip(1:26, 'a':1:'z')"

Not committed to any of these, I just find the existing hcat / vcat / hvcat syntax with spaces & ;s already gets confusing fast (although the ability to copy-paste matrices is useful). For example notice that you switched above in observations2, not sure if deliberately:

julia> [[1.9, 1.8, 1.6, 1.4] [1.8, 1.7, 1.4, 1.5]] |> string  # hcat of vectors
"[1.9 1.8; 1.8 1.7; 1.6 1.4; 1.4 1.5]"                        # hvcat of numbers

@BioTurboNick
Copy link
Contributor Author

Fair point on long type names. Idk what's best there.

"reshape of an Array does not return a ReshapedArray" -- Ah you are correct, I was confusing that with ReinterpetArray. So perhaps that would be a useful way to display it...

reshape([4; 5; ... 6; 8], (2, 3, 4))

Not bad.

Re: my example, yes I was testing the data in two forms (array of vectors and a full multidimensional array) so it was just easier to copy the vectors there.

@mcabbott
Copy link
Contributor

Code to try this out:

julia> function Base._show_nonempty(io::IO, X::AbstractArray, prefix::String)
         ioc = IOContext(io, :displaysize => displaysize(io) .- (11+3ndims(X), 0)) # does nothing?
         print(ioc, "reshape(")
         Base.show_vector(ioc, vec(X), prefix * "[")
         print(ioc, ", ", size(X), ")")
       end

julia> a = [[130.5, 154.25, 173.0], [141.333 159.667 164.333], cat(152.583, 152.583, dims = 3)]
3-element Vector{Array{Float64,N} where N}:
 [130.5, 154.25, 173.0]
 [141.333 159.667 164.333]
 reshape([152.583, 152.583], (1, 1, 2))

julia> [rand(Int8, 6,7,8,9) for _ in 1:3]
3-element Vector{Array{Int8,4}}:
 reshape(Int8[-39, -27, 118, -108, -61, 69, 113, 0, -128, 16  …  106, 88, 123, 86, 66, -3, -43, 35, -21, -122], (6, 7, 8, 9))
 reshape(Int8[-9, 118, -72, 102, -2, 123, -18, 109, 109, -102  …  104, 21, -109, 119, -81, -79, -23, 71, -47, -101], (6, 7, 8, 9))
 reshape(Int8[-90, -47, 41, -76, -88, -123, 57, 23, -59, -20  …  -122, 80, 13, -44, -82, 40, -38, -51, -1, 45], (6, 7, 8, 9))

@mcabbott
Copy link
Contributor

mcabbott commented Oct 4, 2020

BTW, #37196 makes cat infer dimensions.

@BioTurboNick
Copy link
Contributor Author

#33697 implements most of this.

The printing aspect may still need refining, but a new issue can be opened to address that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] display and printing Aesthetics and correctness of printed representations of objects.
Projects
None yet
Development

No branches or pull requests

3 participants