Syntax for multidimensional arrays #33697

BioTurboNick · 2019-10-28T05:59:44Z

Use multiple semicolons to denote >2 dimensions in an inline N-dimensional array specification.

Summary of all changes in PR and how it works: #33697 (comment)

Breaking changes: Extra semicolons in matrix expression currently ignored. However, use is likely to be rare and unintentional. ~~#37168 adds deprecation warning for this.~~ That PR was merged.

Related work:

I did this largely as an exercise for myself, to partly address #30467 . ~~Unsure how much I can do on the rest but if I get pointers I may be able to. If this approach is deemed worth continuing.~~

julia> [1 2;3 4;;5 6;7 8;;9 0;2 3]
2×2×3 Array{Int64,3}:
[:, :, 1] =
 1  2
 3  4

[:, :, 2] =
 5  6
 7  8

[:, :, 3] =
 9  0
 2  3

julia> [1;3;;5;7;;9;2]
2×1×3 Array{Int64,3}:
[:, :, 1] =
 1
 3

[:, :, 2] =
 5
 7

[:, :, 3] =
 9
 2

julia> [1;;5;;9]
1×1×3 Array{Int64,3}:
[:, :, 1] =
 1

[:, :, 2] =
 5

[:, :, 3] =
 9

StefanKarpinski · 2019-10-28T13:28:22Z

You appear to have introduced whitespace changes on every line. Did you perhaps edit the files on Windows in an editor that uses \r\n line endings?

vtjnash · 2019-10-28T15:18:29Z

(downside of #32781 is that problem isn't caught locally anymore)

BioTurboNick · 2019-10-28T16:17:30Z

Oops, somehow triggered Atom to switch from LF to CRLF in those files. Fixed.

BioTurboNick · 2019-10-28T17:45:41Z

Saw there was a failing ParseError test because I forgot to restore a check and error message. Should be fixed.

BioTurboNick · 2019-10-29T16:18:29Z

Just did some quick performance checks

Allocated 10,000 times in a for loop inside a function, minimum observed time over several trials:

[1;3;5;7]
# 1.2: 0.000332 seconds (10.00 k allocations: 1.068 MiB)
# new: 0.000311 seconds (10.00 k allocations: 1.068 MiB)

[1 1; 2 2; 3 3; 4 4]
# 1.2: 0.000442 seconds (10.00 k allocations: 1.373 MiB)
# new: 0.000480 seconds (10.00 k allocations: 1.373 MiB)

cat([1 1; 2 2], [3 3; 4 4], dims = 3) / [1 1; 2 2;; 3 3; 4 4]
# 1.2:                     0.047043 seconds (340.00 k allocations: 16.632 MiB)
# new (cat function call): 0.049554 seconds (390.00 k allocations: 18.158 MiB)
# new (syntax):            0.048661 seconds (390.00 k allocations: 18.158 MiB)

All times comparable. 1d and 2d arrays identical allocations.

Not sure why cat itself has more overhead in my build. It doesn't seem to replicate in 1.2.0, 1.3.0-rc4, or 1.4.0-DEV. Some difference between the Windows build of Julia (what I use normally) and the Linux build (what I'm building in)?

BioTurboNick · 2019-10-29T21:19:19Z

Just added inline printing of arrays, to complete the fix for #30467. Wasn't as hard as I thought.

Thinking more about the 34-39x allocation overhead of the cat() approach, thought about reshape being more performant, and it appears so, only a 2x allocation overhead.

# 10,000 iterations using reshape: 0.000731 seconds (20.00 k allocations: 2.136 MiB)

Which leads me to wonder why cat() has such a high overhead? Using reshape() from the parser would potentially make parsing easier, but perhaps there are things that can be done to improve cat()?

BioTurboNick · 2019-10-30T15:25:19Z

I fixed two bugs caught thanks to failing tests:

OffsetArray indices weren't handled well.
Two functions would unintentionally return a boolean instead of () due to conditional statement.

JeffBezanson · 2019-10-31T18:51:08Z

Triage likes this, but we think there should be a release with a deprecation warning for using multiple semicolons. @Keno also has an alternate proposal to use ;; to mean concatenation in the (N+1)ist dimension.

Keno · 2019-10-31T19:02:32Z

An alternative I proposed on the triage call: Have
[a ;; b] be a generic concat in the n+1'st dimension operator. For 3d arrays, it'd largely be equivalent, but it's different in higher dimensions and there are some lower dimensional corner cases.
For 4d, my proposal would have

[[ [1 2;3 4] ;; [5 6;7 8] ;; [9 0;2 3] ] ;;
 [ [1 2;3 4] ;; [5 6;7 8] ;; [9 0;2 3] ]]

vs

[ 1 2;3 4 ;; 5 6;7 8 ;; 9 0;2 3 ;;;
  1 2;3 4 ;; 5 6;7 8 ;; 9 0;2 3 ]

in this PR. For lower dimensions, in my proposal [1; 2;;] would be 2x1, while in this PR it's 2x1x1 (we don't currently have a literal syntax for 2x1, I don't think). The primary thing I like about my proposal though is that it forces writing the [], so for large literals it should be easy to navigate to the dimension you want, by using the existing editor parenthesis matching feature. I think the primary drawback is that higher dimensions quickly become more verbose in my proposal when the elements being concatenated are of low dimension [1;;;5;;;9] in this PR vs (I think) [ [[[1;;];;];;] ;; [[[5;;];;];;] ;; [[[9;;];;];;] ]. Of course there is a correspondingly bad example for concatenating high dimensional inputs in high dimension with this PR:

julia> a = fill(1, (1 for i = 1:10)...)
1×1×1×1×1×1×1×1×1×1 Array{Int64,10}:
[:, :, 1, 1, 1, 1, 1, 1, 1, 1] =
 1

julia> [a ;;;;;;;;;;; a]

Anyway, food for thought.

Keno · 2019-10-31T19:37:02Z

Of course a variant on my proposal that is even more similar to this proposal is to still use the multiple semicolons to concat in the n+i-1'st dimension, where i is the number of semicolons.

StefanKarpinski · 2019-10-31T19:48:08Z

That seems at odds with what the current single semicolon syntax does, i.e. always concatenate in the second dimension.

Keno · 2019-10-31T19:51:22Z

yes, that's why I said multiple semicolons (as in > 1). I don't think it's worth too much effort trying to make that consistent. As mentioned on the triage call, even in this proposal n semicolons increment the n == 1 ? n : n+1st dimension, (with being the dimension 2 concat operator).

Keno · 2019-10-31T19:55:33Z

In either case, we should do the change to reserve this syntax with a warning right away.

BioTurboNick · 2019-10-31T20:01:36Z

Interesting idea @Keno. I appreciate the benefits of your proposal and for my own uses it'd be just as suitable. And perhaps they could be combined as you suggest.

FTR, I'd consider [1;2;;] mapping to 2x1x1 but [1;2;] not mapping to 2x1 to be a bug in my PR, so thanks for catching that. I hadn't actually intended trailing semicolons to have an effect, so resolving in either direction would work. I suppose it couldn't hurt to allow that, although I suppose you might also want the mirror syntax where e.g. [;;;1;2] would add empty dimensions to the front?

With respect to a ten-dimensional cat, I figure at that point it'd be more readable to fall back to cat(a, a, dims = 10), anyway, and that's probably okay?

JeffBezanson · 2019-10-31T20:04:21Z

but [1;2;] not mapping to 2x1 to be a bug in my PR

I'm not sure we can change [1;2;] --- that seems much more breaking than changing multiple semicolons.

Keno · 2019-10-31T20:06:31Z

I also don't think we want to change it. It's consistent with this PR. The real problem is that is the concat operator in dimension 2, and we don't want to make a trailing whitespace be sensitive

BioTurboNick · 2020-04-13T21:30:59Z

I tried to rebase onto the current master branch but something screwed up. Don't know if I did something or if the current master is iffy. Is there a way to tell where the error is from or advice on the best way to reset the branch?

Liozou · 2020-04-13T22:27:16Z

I believe your rebase went wrong, since it includes many commits that are irrelevant to this PR.
I don't know if this will solve all your problems, but what you should probably do is:

First, in your fork of julia, update your master branch with git pull upstream master. If this fails because your upstream is not set, do it first (git remote add upstream https://github.com/JuliaLang/julia.git)
Then, in your multidimensional-arrays2 branch, do an interactive rebase up to the first included commit in the list of current commits for this PR. I see there are 49 commits, so you should do git rebase -i multidimensional-arrays2~50 multidimensional-arrays2.
In the prompt, keep all the commits you want to have as part of this PR by leaving them marked with the initial pick, and remove all those irrelevant by changing the pick to drop.
If the rebase does not immediately succeed, it means there are some conflicts, so fix them.
Once the rebase is complete, the branch should have the commits from julia followed by the list of commits you want to have in this PR. You can check whether it is the case with git log.
The next step is to do a proper rebase with the updated master branch: do so with git rebase master multidimensional-arrays2. Again, there might be some conflicts, so you have to resolve them.
Once this rebase is complete, check again with git log that you have all the julia commits up to now followed by your commits for this PR. If you are happy, force-push the change.

Hope it helps!

BioTurboNick · 2020-04-14T15:00:57Z

Ah, perfect @Liozou, thank you!

Liozou · 2020-04-14T15:30:24Z

Glad I could help!
There are still two oddities in your commits : the one named "More work" looks like it rewrote the entire julia-parser.scm file, and same issue with commit "Inline printing of multidimensional arrays" for file arrayshow.jl. I think the issue comes from your end-of-lines, since when I do git diff on your branch for the relevant commits I see that each line has been appended with a non-printable character... So you should probably remove those. Sorry I can't give you pointers on how to do that however, I can only guess it happened because of your text editor.

StefanKarpinski · 2020-08-22T12:46:22Z

Windows editor turning new lines into \r\n perhaps? If so, there are git settings to prevent that from getting committed in the repo.

BioTurboNick · 2020-08-22T15:21:31Z

@StefanKarpinski - Yeah, it's bitten me a few times. I think I've sorted it out.

Just added typed_ncat, so now this is also valid:

UInt16[1 2 3; 4 5 6;; 7 8 9; 10 11 12]
#=
2×3×2 Array{UInt16,3}:
[:, :, 1] =
 0x0001  0x0002  0x0003
 0x0004  0x0005  0x0006

[:, :, 2] =
 0x0007  0x0008  0x0009
 0x000a  0x000b  0x000c
=#

UInt16[1 2 3; 4 5 6;;; 7 8 9; 10 11 12]
#=
2×3×1×2 Array{UInt16,4}:
[:, :, 1, 1] =
 0x0001  0x0002  0x0003`
 0x0004  0x0005  0x0006

[:, :, 1, 2] =
 0x0007  0x0008  0x0009
 0x000a  0x000b  0x000c
=#

I'm looking at tests for the parser, and I'm not certain from looking what I should be testing for or where they should go in the syntax.jl file. They seem to be organized by issue rather than type? EDIT: Did just add documentation and doctests, if that covers it.

BioTurboNick · 2021-05-19T18:25:14Z

I think everything has been addressed, now just to get to green.

simeonschaub

Note the doctest failure, but otherwise LGTM!

NEWS.md

BioTurboNick · 2021-05-20T05:25:46Z

Okay, looks like we got it! Getting the operations on the intermediate data structure just right is trickier than it seems like it should be.

~~One small addition: I realized that a line break followed by a semicolon is potentially a problem. e.g.~~

[1
;2]

~~In the master branch, this syntax is accepted but is ignored. I turned it into an error, which is also consistent with the fact that this is considered invalid in the master branch:~~

[;1

BioTurboNick · 2021-05-20T12:53:40Z

Actually, I can easily make it so that it's one side of the linebreak or the other but not both. Then someone could, if they wanted, do something like:

In fact, already done.

simeonschaub · 2021-05-20T16:42:34Z

OK, let's try this!

This will get a different meaning in 1.7 (ref JuliaLang/julia#33697)

This will get a different meaning in 1.7. (ref JuliaLang/julia#33697)

simeonschaub · 2021-05-21T13:37:46Z

Should we disallow spaces between semicolons as in [1; ; 2]? javierbarbero/DataEnvelopmentAnalysis.jl#6 and DrChainsaw/NaiveNASlib.jl#85 are two examples which were using that by accident, so it seems potentially worthwhile to me to throw an error here.

mbauman · 2021-05-21T13:41:35Z

Oh interesting, yes, I agree we should do that. Because I had to check: the currently implemented mechanism treats [1 ; ; 2] like [1 ;; 2], but I've been thinking of ;; as a single indivisible token.

BioTurboNick · 2021-05-21T17:15:31Z

That is a good idea. Implemented in #40903

Co-authored-by: Matt Bauman <[email protected]> Co-authored-by: Jeff Bezanson <[email protected]> Co-authored-by: Simeon Schaub <[email protected]>

StefanKarpinski added minor change Marginal behavior change acceptable for a minor release triage This should be discussed on a triage call labels Oct 28, 2019

StefanKarpinski requested a review from JeffBezanson October 28, 2019 13:29

mbauman changed the title ~~Specification of multidimensional arrays~~ Syntax for multidimensional arrays Oct 28, 2019

mbauman added the parser Language parsing and surface syntax label Oct 28, 2019

BioTurboNick force-pushed the multidimensional-arrays2 branch from 27fdfd8 to 80ce4bf Compare April 13, 2020 20:33

BioTurboNick force-pushed the multidimensional-arrays2 branch from a529408 to 26e2f26 Compare April 14, 2020 05:54

This comment has been minimized.

Sign in to view

BioTurboNick added 2 commits May 19, 2021 14:22

added tests for newlines

6995dc7

tests

e38ba55

BioTurboNick force-pushed the multidimensional-arrays2 branch from 4d7f3a7 to e38ba55 Compare May 19, 2021 18:22

simeonschaub approved these changes May 19, 2021

View reviewed changes

NEWS.md Show resolved Hide resolved

BioTurboNick added 3 commits May 19, 2021 17:53

Some refactoring, not quite there

b47d342

🤞🏻

526afc5

🤞🏻🤞🏻

6799644

BioTurboNick added 3 commits May 20, 2021 01:31

Update NEWS

d5ba602

Test to enforce prohibition of semicolon after linebreak

28dbfa4

Allow semicolons after linebreak

a53cacb

simeonschaub merged commit 9117b4d into JuliaLang:master May 20, 2021

BioTurboNick mentioned this pull request May 20, 2021

Inline rendering of 3d array confusing; proposed syntax for displaying/initializing 3d array #30467

Closed

simeonschaub added a commit to simeonschaub/DataEnvelopmentAnalysis.jl that referenced this pull request May 21, 2021

fix typo in show method

702735a

This will get a different meaning in 1.7 (ref JuliaLang/julia#33697)

simeonschaub mentioned this pull request May 21, 2021

fix typo in show method javierbarbero/DataEnvelopmentAnalysis.jl#6

Merged

simeonschaub added a commit to simeonschaub/NaiveNASlib.jl that referenced this pull request May 21, 2021

fix typo in tests

49763ce

This will get a different meaning in 1.7. (ref JuliaLang/julia#33697)

simeonschaub mentioned this pull request May 21, 2021

fix typo in tests DrChainsaw/NaiveNASlib.jl#85

Merged

mateuszbaran mentioned this pull request May 23, 2021

Refine documentation JuliaManifolds/ManifoldsBase.jl#69

Merged

simeonschaub removed the forget me not PRs that one wants to make sure aren't forgotten label May 29, 2021

extradosages mentioned this pull request Oct 12, 2021

Unsupported multi-dimensional array indexing domluna/JuliaFormatter.jl#490

Closed

simeonschaub mentioned this pull request Jan 15, 2022

wrong printing for single-element matrix #36732

Closed

simonbyrne mentioned this pull request Feb 13, 2023

RFC: Show single column array as permutedims of single row #31019

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax for multidimensional arrays #33697

Syntax for multidimensional arrays #33697

BioTurboNick commented Oct 28, 2019 •

edited

Loading

StefanKarpinski commented Oct 28, 2019

vtjnash commented Oct 28, 2019

BioTurboNick commented Oct 28, 2019

BioTurboNick commented Oct 28, 2019

BioTurboNick commented Oct 29, 2019 •

edited

Loading

BioTurboNick commented Oct 29, 2019

BioTurboNick commented Oct 30, 2019

JeffBezanson commented Oct 31, 2019

Keno commented Oct 31, 2019

Keno commented Oct 31, 2019 •

edited

Loading

StefanKarpinski commented Oct 31, 2019

Keno commented Oct 31, 2019

Keno commented Oct 31, 2019

BioTurboNick commented Oct 31, 2019

JeffBezanson commented Oct 31, 2019

Keno commented Oct 31, 2019

BioTurboNick commented Apr 13, 2020

Liozou commented Apr 13, 2020

BioTurboNick commented Apr 14, 2020

Liozou commented Apr 14, 2020

StefanKarpinski commented Aug 22, 2020

BioTurboNick commented Aug 22, 2020 •

edited

Loading

This comment has been minimized.

BioTurboNick commented May 19, 2021

simeonschaub left a comment

BioTurboNick commented May 20, 2021 •

edited

Loading

BioTurboNick commented May 20, 2021

simeonschaub commented May 20, 2021

simeonschaub commented May 21, 2021

mbauman commented May 21, 2021

BioTurboNick commented May 21, 2021

Syntax for multidimensional arrays #33697

Syntax for multidimensional arrays #33697

Conversation

BioTurboNick commented Oct 28, 2019 • edited Loading

StefanKarpinski commented Oct 28, 2019

vtjnash commented Oct 28, 2019

BioTurboNick commented Oct 28, 2019

BioTurboNick commented Oct 28, 2019

BioTurboNick commented Oct 29, 2019 • edited Loading

BioTurboNick commented Oct 29, 2019

BioTurboNick commented Oct 30, 2019

JeffBezanson commented Oct 31, 2019

Keno commented Oct 31, 2019

Keno commented Oct 31, 2019 • edited Loading

StefanKarpinski commented Oct 31, 2019

Keno commented Oct 31, 2019

Keno commented Oct 31, 2019

BioTurboNick commented Oct 31, 2019

JeffBezanson commented Oct 31, 2019

Keno commented Oct 31, 2019

BioTurboNick commented Apr 13, 2020

Liozou commented Apr 13, 2020

BioTurboNick commented Apr 14, 2020

Liozou commented Apr 14, 2020

StefanKarpinski commented Aug 22, 2020

BioTurboNick commented Aug 22, 2020 • edited Loading

This comment has been minimized.

BioTurboNick commented May 19, 2021

simeonschaub left a comment

Choose a reason for hiding this comment

BioTurboNick commented May 20, 2021 • edited Loading

BioTurboNick commented May 20, 2021

simeonschaub commented May 20, 2021

simeonschaub commented May 21, 2021

mbauman commented May 21, 2021

BioTurboNick commented May 21, 2021

BioTurboNick commented Oct 28, 2019 •

edited

Loading

BioTurboNick commented Oct 29, 2019 •

edited

Loading

Keno commented Oct 31, 2019 •

edited

Loading

BioTurboNick commented Aug 22, 2020 •

edited

Loading

BioTurboNick commented May 20, 2021 •

edited

Loading