Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static array length should not change data interpretation #1886

Closed
JeffBezanson opened this issue Jan 10, 2019 · 22 comments · Fixed by #1986
Closed

static array length should not change data interpretation #1886

JeffBezanson opened this issue Jan 10, 2019 · 22 comments · Fixed by #1986

Comments

@JeffBezanson
Copy link
Contributor

JeffBezanson commented Jan 10, 2019

Currently, plotting an array of 2-element StaticVectors treats the elements as 2d points. 3-element StaticVectors are treated as 3d points. Arrays of 4+-element static arrays, or other types of arrays, are instead treated as multiple line plots.

I believe the purpose of StaticArrays is to specialize code and data representation for a certain size, not to change semantics. For example SVector{2}([1,2]) == [1,2] is true; the objects are supposed to "mean" the same thing.

I know this would be a (possibly major) breaking change, but I think it would make the API and mental model much cleaner. An extra small motivation is that ideally Plots should not depend on StaticArrays, since I don't believe it truly needs them (e.g. for performance). I don't know what the impact would be on load time and compile time, but removing dependencies can only help.

@mkborregaard
Copy link
Member

mkborregaard commented Jan 13, 2019

I can definitely see where you're coming from here. It's a little at odds with Plots' core philosophy, though, to "attempt to figure out what you want it to do... not just what you tell it". A vector of 2- or 3- length StaticVectors is practically without exception a description of 2- or 3-dimensional point coordinates. This follows the behaviour for 2- and 3-length tuples (which is very widely used for generating plots). Arrays of 4+-element static arrays could be anything though - I doubt they are ever used in Plots' calls.

Seen from Plots' perspective it would thus be least problematic to simply disallow plotting of 4+-element staticarrays, which would avoid the dispatch-on-length problem (though I thought the ability to dispatch on length was exactly a desirable property of StaticArrays?). Of course that would still collide with StaticArrays' desire to act like normal vectors.

On the other hand it is possible that noone uses the ability to plot 2- or 3- length StaticArrays - it was originally implemented for FixedSizeArrays and just changed to StaticArrays in the julia-0.6 update - so we could try to scope the potential impact of doing what you suggest here and simply behave like they are normal vectors. Has this come up in actual use or is it just based on reviewing the code?

@JeffBezanson
Copy link
Contributor Author

though I thought the ability to dispatch on length was exactly a desirable property of StaticArrays

IMO the main point is to specialize code on the length of StaticArrays. There might be some uses for dispatching on length, but it seems clear to me that you don't generally want functions doing arbitrarily different things for different lengths of array.

Has this come up in actual use or is it just based on reviewing the code?

Just from reviewing the code. I think it's generally not good design to depend on other packages in order to treat their types differently than other similar types. There are who-knows-how-many julia array types out there, and plots can handle them all via general AbstractArray interfaces instead of by depending on them.

But, as somebody who uses Plots from time to time, let's say I happen to have data in the form [[x1,y1], [x2,y2], ...] (a vector of 2-Vectors specifying points). What is the standard approach to plotting it as intended? I can use plot(Tuple.(pts)), or call Plots.unzip. Ok, so why not use that API consistently for all array types? It could even be a bit nicer, e.g. plot(points(pts)) which makes the intent clear.

@mkborregaard
Copy link
Member

mkborregaard commented Jan 14, 2019

Yes, plot(Tuple.(pts)) is the normal use. I don't know about points, that sounds like a dangerously wide word to export for this specific use?

But, it's perfectly fine to use this same approach for all array types of course. I just have a hard time imagining when someone would have a Vector of length-2 SVectors and that would not be points, so it would just add an extra step.

I see your point about explicitness, though I tend to think plotting code is a bit unusual in that it's almost invariably used interactively at the REPL, where flexibility and succinctness really help - especially as the ideomatic way to specify Plots' plots in library or reusable code is via recipes, which are 100% explicit.

Anyway, I didn't get any big complains when I asked about changing the behaviour on Slack, so I'm open to changing it as you suggest. @daschw , do you agree?

@JeffBezanson
Copy link
Contributor Author

I don't know about points, that sounds like a dangerously wide word to export for this specific use?

Sure, we don't need to use that exact name/syntax, just something to express that I have an iterable-of-iterables to treat as an array of points. It's a secondary request though of course.

Thanks for considering this.

@mkborregaard
Copy link
Member

Here's a PR implementing this: #1892

@daschw
Copy link
Member

daschw commented Jan 16, 2019

Sorry, I missed your question @mkborregaard. Yes I agree.

@mkborregaard
Copy link
Member

Note: the PR that removes the StaticArrays dep does not change the time of using Plots in any way.

@SimonDanisch
Copy link
Member

I believe the purpose of StaticArrays is to specialize code and data representation for a certain size, not to change semantics.

That's your interpretation :P StaticArrays has come a long way since it replaced FixedSizeArrays - but this exact use case was a big motivation for writing FixedSizeArrays for me.

I'm not sure if you can say "it was written as an optimization" generally implies that we can't use it to notify an API about user intentions.

Of course the presence of [[2,3], [3,4]] puts us in an awkward spot, since we don't have the length in the type and we'd need to dynamically check each element - which we don't really want to do.

As I see it, we would love to also check if an Vector{Vector{<: Number}} contains only 2 element arrays and display it as points. So in a way, StaticArrays is indeed used as an optimization here... With the awkwardness left, that we don't do it for the dynamic case since it would imply too much checking in places were it only adds overhead.

@mkborregaard
Copy link
Member

I essentially agree with @SimonDanisch . I really don't want Plots to breach the package ecosystem etiquette or some such thing, but as long as it's open to interpretation I think the syntax is useful here.

@daschw
Copy link
Member

daschw commented Jan 16, 2019

To be honest, I agree with @JeffBezanson here:

I think it's generally not good design to depend on other packages in order to treat their types differently than other similar types.

I'd find it cleaner not to depend on StaticArrays. I think, as a general rule, Plots should implement type recipes for Base and stdlib types and type recipes for other packages should live in the respective package code or in an extra dedicated package like we did for Distributions etc. in StatPlots. However, I doubt that StaticArrays.jl would be willing to include a type recipe and a RecipesBase dependency and having a small extra Package like StaticArrayPlots.jl or similar would firstly probably not be used that much and secondly be weird in this situation because StaticArray <: AbstractArray and Plots would work for StaticArrays anyway but behave differently when StaticArrayPlots was loaded.

I guess it really depends on how much this interpretation that length 2 or 3 StaticArrays correspond to points is used in the wild. I never use it, but if there is some critical mass that relies on it and the StaticArrays dependency does not influence load or compilation time then the practical solution would probably be to keep the StaticArrays dependency. Otherwise, I'd prefer dropping it.

@mkborregaard
Copy link
Member

How can we find the subset of published packages that relies directly (not transitively) on both RecipesBase and StaticArrays? That would be a good place to scope the use of this.

@daschw
Copy link
Member

daschw commented Jan 16, 2019

How can we find the subset of published packages that relies directly (not transitively) on both RecipesBase and StaticArrays?

I have no idea.

@mkborregaard
Copy link
Member

ApproxFun
Bridge
CausalityTools
DiffEqBase
DiffEqCallbacks
DiffEqNoiseProcess
DiffEqPhysics
GaussianProcesses
GeoStats
GeoStatsDevTools
IntervalArithmetic
MIToS
NBodySimulator
PerronFrobenius
Plots
Polyhedra
QHull
RecursiveArrayTools
Robotlib
SolidStateDetectors
UncertainData
Variography
WeightedArrays

@JeffBezanson
Copy link
Contributor Author

JeffBezanson commented Jan 16, 2019

since we don't have the length in the type and we'd need to dynamically check each element - which we don't really want to do.

As I see it, we would love to also check if an Vector{Vector{<: Number}} contains only 2 element arrays and display it as points. So in a way, StaticArrays is indeed used as an optimization here...

Yes, I agree it would be better to treat all arrays of 2-vectors the same. Is the overhead really that bad? Does it matter, when you're plotting all the points anyway? I also just checked that all(a->length(a)==2, A) gets optimized out completely for an array of SVector{2}s (without depending on the package! magic! 😄 ).

I'm not sure if you can say "it was written as an optimization" generally implies that we can't use it to notify an API about user intentions.

Fair point, but I still think this is a hack. Is there a standard 2d point type used by julia graphics packages? Or is SVector{2} it? Why not use the Point type from GeometryTypes.jl or Graphics.jl? (the fact that both exist is itself a bit of a problem, too)

@daschw
Copy link
Member

daschw commented Jan 16, 2019

I think we have to clarify are discussing two different questions here:

  1. Should an SVector{2}/SVector{3} be interpreted visually as a Point in 2d/3d space or as an optimization of an AbstractVector of length 2/3?
  2. Should Plots.jl or a package defining a type decide, how this type should be interpreted visually?

I have no strong opinion on 1., but regarding 2. I think that it's the package's responsibility to define the visual interpretation of a type. Otherwise, where do we draw the line? Which packages should we depend on to make this decsion?

Why not use the Point type from GeometryTypes.jl or Graphics.jl?

This would avoid the need to answer 1. here, but in the context of 2., this would just be a shift from a StaticArrays.jl dependency to a GeometryTypes.jl/Graphics.jl dependency.

@JeffBezanson
Copy link
Contributor Author

I agree --- generally a package defining T should also specify how to show/visualize T, as is basically always the case for e.g. show and print.

However, that depends on how closely related the package is to graphics/visualization. For example, you might want to visualize a sorting algorithm, but doing that is pretty far from the typical applications of sorting itself, and so would probably be done by a separate package.

It also depends on whether the type has a natural visual or geometric interpretation. It's so obvious how to geometrically interpret a Point2D type that it doesn't really matter which package implements drawing code for it.

this would just be a shift from a StaticArrays.jl dependency to a GeometryTypes.jl/Graphics.jl dependency.

It makes sense to me that a graphics package would depend on one of those though, particularly if the Point types defined there are considered canonical. But if GeometryTypes.Point is considered somehow marginal, optional, or "quirky" in some way then yes it would be best for Plots not to depend on it, and instead have GeometryTypes somehow tell Plots "treat this type like a point".

@daschw
Copy link
Member

daschw commented Jan 16, 2019

It makes sense to me that a graphics package would depend on one of those though

Fair point. In my opinion it would still be preferable if GeometryTpes would tell Plots how to interpret Point2D via RecipesBase. @SimonDanisch what's your opinion on this?

@JeffBezanson
Copy link
Contributor Author

it would still be preferable if GeometryTpes would tell Plots how to interpret Point2D via RecipesBase.

I'm fine with that. It could be even simpler for something as simple as a point though. My hope was that some package like Graphics.jl or GeometryTypes.jl would define just a few basic things like Abstract2DPoint that every graphics-related package would depend on.

@SimonDanisch
Copy link
Member

Yeah I guess that could work... We also discussed to have something like that in StaticArrays...
What would still bug me is, that plotting is usually about maximum ease - and sometimes also about maximum performance, so it's a bit annoying to reinterpret a vec_of_vec ;)
Anyways, I basically exclusively use GeometryTypes.Point or Tuple in Makie ;) So if those cases are supported really well, I probably wouldn't cry too much if we drop SVector from that list.

@mkborregaard
Copy link
Member

mkborregaard commented Jan 17, 2019

Yes, I think depending on GeometryTypes in Plots would be the wrong way around - it's much better to have the package defining the type also define the plotting functionality. In this case that would involve GeometryTypes taking on a dep on the very small RecipesBase (300 lines of code with no exports and no dependencies that only changes along with julia minor versions), and then the line

using RecipesBase
@recipe f(pts::AbstractVector{T}) where {T <: Union{Point{2}, Point{3}}} = [p.data for p in pts]

in GeometryTypes would be sufficient to do

using Plots, GeometryTypes
pts = rand(Point{2}, 100)
scatter(pts)

with no dependencies between them - magic :-)
@SimonDanisch I'll add the recipes to GeometryTypes if you'll tell me what types I should add them to (that's not transparent - e.g. Point is not even mention in the documentation - also remember Plots doesn't have volumes).

In the context of using GeometryTypes for this, I guess an unclarity of semantics is whether Points refer to scatter points; in Plots we use the StaticArrays/Tuples as coordinates independently of seriestypes (it could be corners of shapes or inflections on lines). Eg you could pass plot([(0,0), (1,1), (-1,1)], seriestype = :shape) and get a triangle - this would be replaced by plot([Point(0,0), Point(1,1), Point(-1,1)], seriestype = :shape) but GeometryTypes has it's own bespoke types for e.g. triangles and lines. At it's core, the points of Plots (and other plotting packages) are not geometric types - they are coordinates.

While I sympathise with the principle, Abstract2DPoint might be a little superfluous, since the three obvious implementations (struct Point; x::Float64; y::Float64; end, struct Point; data::Tuple{Float64, Float64}; end and struct Point; data::SVector{2, Float64}; end are basically identical. Better to have a concrete implementation like GeometryTypes be a standard library of sorts.

In the end, while I think it's a nice functionality to let go I'll defer to the decision of Daniel and Simon and have more or less already made the PRs to make the change happen.

@andyferris
Copy link

@JeffBezanson wrote

I believe the purpose of StaticArrays is to specialize code and data representation for a certain size, not to change semantics.

@SimonDanisch wrote

That's your interpretation :P StaticArrays has come a long way since it replaced FixedSizeArrays - but this exact use case was a big motivation for writing FixedSizeArrays for me.

For the record, my interpretation is the same as Jeff's 😛

While StaticArrays began as simple rewrite of FixedSizeArrays to target a newer Julia compiler, I was philosophically committed to these being just AbstractArrays. I realize this differed a bit from Simon's goal of creating a (very nice and generic!) geometric powerhouse.

My interpretation of dispatch (and the fact that invoke exists at all in a language designed for generic programming...) is that the behaviour of f(::StaticArray) should typically not differ from f(::AbstractArray) given that StaticArray <: AbstractArray. The static size just provides additional guarantees about the AbstractArray - kind of like how AbstractArray{Float64, 2} provides more guarantees than AbstractArray. I agree it seems awkward at first with points and so-on but I'm a fan of nested containers when they provide simplicity and I'd not be too surprised to have to use [SVector(1,2)] to plot a single point.

In cases like this I find the rule I have been applying is: a generic function needs to be clear if a slot represents a single thing or a list of things. E.g. we have separate max and maximum functions (because iterables can also have order). I'm assuming here we definitely want to allow plotting of either a list of y-values or a list of (x,y) pairs. This immediately becomes incompatible with allowing a single (x,y) pair (unless it is nested inside a container).

That all said, @SimonDanisch I do regret the gap between what StaticArrays is now and the convenience FixedSizeArrays provided for basic geometry in the past

@mkborregaard
Copy link
Member

I've come around to thinking that the resolution here is to have Plots depend on GeometryTypes and have first-class support for Point, LineSegment and Polygon. Dependency-wise it adds nothing more than StaticArrays anyway and it makes sense to use these abstractions throughout Julia. @daschw how do you feel about this solution? It changes our normal philosophy for packages in the case of GeometryTypes but comes with advantages too for packages that use them (and for this case).

daschw added a commit that referenced this issue Apr 8, 2019
replace StaticArrays with GeometryTypes (fix #1886)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants