-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure promotion rules don't alter eltype values #26109
Conversation
This also helps to re-synchronize `cat` and `map` Closes #25924
@@ -189,11 +189,11 @@ end | |||
for T in (Nothing, Missing) | |||
x = [(1, T()), (1, 2)] | |||
y = map(v -> (v[1], v[2]), [(1, T()), (1, 2)]) | |||
@test y isa Vector{Tuple{Int,Union{T,Int}}} | |||
@test y isa Vector{Tuple{Int, Any}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is simply a very minor known optimization miss (specifically, that we don't yet implement the obvious optimization to split x isa Union{Int, Missing}
to x isa Int || x isa Missing
), so we spend almost all of the time running subtyping. This is quite trivial – and non-breaking – to fix, so we haven't worked on it yet. We can verify that this PR doesn't actually affect performance by forcing away the handling of missing at the end (and also reducing the expense of the kernel operation from sin
to *
):
getx(x::Missing) = 0
getx(x) = x
f(x) = begin
@time [ getx(2*(x[1])) for x in x ]
@time Any[ getx(2*(x[1])) for x in x ]
@time map(x -> getx(2*(x[1])), x)
nothing
end; f(x); f(x);
Note that the data vector used to benchmark in #25924 isn't quite correct for general comparisons. I'm using x = map(i -> (isodd(i) ? missing : 12345,), 1:10_000);
which ensures I have the same percent missingness and avoids the small-integer cache.
By contrast, I don't really know that the issue at #25925 has a good solution. It's possible just to widen incrementally – which will give fairly decent performance on micro-benchmarks – but will be impossible to precompile effectively, so load times might always be very long.
Addendum: if you actually do care about performance, doing this column-wise (e.g. as a Tuple{Vector...}
) would give me a significant (3-10x) speedup*.
f(x) = begin
@time [ getx(2*(x)) for x in x ]
@time Any[ getx(2*(x)) for x in x ]
@time map(x -> getx(2*(x)), x)
nothing
end; f(xx);
xx = map(x -> x[1], x);
* this is failing to inline, which will also prevent vectorization, and is known performance issue #23338 – it should be even faster yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is simply a very minor known optimization miss (specifically, that we don't yet implement the obvious optimization to split x isa Union{Int, Missing} to x isa Int || x isa Missing), so we spend almost all of the time running subtyping. This is quite trivial – and non-breaking – to fix, so we haven't worked on it yet. We can verify that this PR doesn't actually affect performance by forcing away the handling of missing at the end (and also reducing the expense of the kernel operation from sin to *):
Interesting. But how is the compiler supposed to detect that it can do x isa Int || x isa Missing
given that the only thing it knows about the input is that it's a Tuple{Any}
? I guess in your example it's possible since getx
has only two methods, but what happens for e.g. sin
or +
?
Also the return type of map
is only inferred as ::AbstractArray
when the input is a Vector{Tuple{Any}}
, which can be very problematic down the line.
Note that the data vector used to benchmark in #25924 isn't quite correct for general comparisons. I'm using x = map(i -> (isodd(i) ? missing : 12345,), 1:10_000); which ensures I have the same percent missingness and avoids the small-integer cache.
OK. I'd rather use something like map(i -> (rand(Bool) ? missing : i,), 1:10_000)
though, to make sure the repetition of (missing, 12345)
doesn't allow the CPU to predict the result.
By contrast, I don't really know that the issue at #25925 has a good solution. It's possible just to widen incrementally – which will give fairly decent performance on micro-benchmarks – but will be impossible to precompile effectively, so load times might always be very long.
That's precisely why I propose using inference rather than widening incrementally. But can we have that discussion there instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that the only thing it knows about the input
The input type doesn't actually play a role here, it's the output type that is failing to codegen. There's an open PR about this, although I can't find it right now.
which can be very problematic down the line
As long as there's a function call boundary (or we finally implement loop out-lining), this has no performance significance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using inference
Many inference optimizations currently rely on being able to compute the expanded form of the input unions, so I don't know that will make much of a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input type doesn't actually play a role here, it's the output type that is failing to codegen. There's an open PR about this, although I can't find it right now.
But how can the compiler infer the output type given that the input is Tuple{Any}
?
As long as there's a function call boundary (or we finally implement loop out-lining), this has no performance significance
Well, yeah, but by this line of reasoning we wouldn't care about type stability of any functions, would we?
Many inference optimizations currently rely on being able to compute the expanded form of the input unions, so I don't know that will make much of a difference.
Sorry, but I'm not sure what this means. Could you try to explain this for mere mortals like me? Also this PR affects the public API, while inference could be improved without breaking the API in 1.x releases. So I think the question is more: what public API will allow for the best optimizations in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how can the compiler infer the output type given that the input is Tuple{Any}?
It doesn't need to – we don't lose any performance from this. All that matters is that the output type is something easy to optimize (like a concrete type or a trivial union).
we wouldn't care about type stability of any functions
Well, actually, no. It's critical to be able to write kernels with "type stability" so that for-loops over Array are optimized. That translates to requiring that most Base code be coded very carefully. But it doesn't actually carry over to mattering much for a significant portion of stdlib and beyond.
while inference could be improved
There are limits to what will be possible. We optimize unions by computing the fully expanded version. In general, this has O(2^n)
components, so that will never help here – it's really just present to handle simple cases like iteration (and even there, it's not simple, ref redesign in #26079 needed to handle just this low order case). The best optimizations in the future will be for code kernels that operate directly on columns of Array{T}
or Array{Union{T, Nothing}}
– which also happens to be the fastest way to do it now (cf. above addendum).
If I understand the PR title correctly, I agree that promotion rules should avoid changing eltypes of inner containers. But this doesn't seem to do that --- it just gives |
I think it should be giving |
My understanding of what we concluded from previous discussions (for example in #25553) is that |
IMO, this should only true be true in specific limited cases. For more general cases, I think we need more expressive syntax (such as some form of |
We've been having the discussion about missing data for so long now. Literally the entire point of I'd really like to avoid rehashing this argument but I think it's important in this case to reiterate what's already been decided, implemented, and is now in use. Please don't regress this case just because you don't agree with it. |
I believe these promotion rules are totally orthogonal to the issue of whether |
Triage decided we should go with this PR, and probably also remove the promote rules for Array eltype (but that's a separate PR) |
Did you intend to keep this method as-is? Lines 80 to 89 in b1b5066
|
I hadn't looked at that method. It's not a bad definition, but it's completely invalid to mark it |
This also helps to re-synchronize
cat
andmap
and restore v0.6 behavior. Since I don't think it's obvious if this type can be made efficient (aka fastest), I think it seems best to avoid changing this to a more complicated type.Closes #25924