Make promote_typejoin recursive #26501

bramtayl · 2018-03-17T18:48:19Z

No description provided.

bramtayl · 2018-03-17T19:57:01Z

So this hit a stackoverflow. But I think that I just revealed an existing bug in typejoin. There's a line that looks like this:

function typejoin(a, b)
    # some stuff
    a′ === a ? typejoin(a, b) : recur(a′, b)

If the a′ === a condition is met then promotion will always overflow. Apparently this condition had just never been met before...

nalimilan · 2018-03-17T20:16:48Z

AFAICT this is what #25553 implemented before it was reverted by #26109.

bramtayl · 2018-03-17T20:27:10Z

Oh ok should have read closer before reinventing the wheel.

bramtayl · 2018-03-17T21:08:29Z

Is there an ELI5 of why #25553 was reverted? Maybe I'm being dense. Is it worth implementing for a custom named-tuple like type?

vtjnash · 2018-03-17T22:35:38Z

It's very slow (causes even worse performance than not having it), so it's misleading to suggest that you should structure code this way. It appears to me that, in general, for good performance you need to operate on data column-wise. Otherwise, trying to copy data by rows makes it too easy to run off the performance cliff and just have no way to recover.

bramtayl · 2018-03-17T22:48:38Z

Well shoot if I knew that I should always work with data column-wise I would have stuck with R where everything is always vectorized in the first place.

nalimilan · 2018-03-17T23:12:36Z

It's very slow (causes even worse performance than not having it), so it's misleading to suggest that you should structure code this way. It appears to me that, in general, for good performance you need to operate on data column-wise. Otherwise, trying to copy data by rows makes it too easy to run off the performance cliff and just have no way to recover.

This statement mixes two different issues:

How can we make tuples with small Union fields as fast as possible
Is it more efficient to operate on vectors of tuples or on tuples of vectors

Regarding 1), I (and others apparently) haven't yet understood why it's the case, and (more importantly) why it's going to remain the case for the whole life of the 1.x series. Could you explain this in detail for non-compiler developers like us?

Regarding 2), I agree that storing data as vectors of tuples is generally a bad idea compared with storing them as tuples of vectors (EDIT: which doesn't mean that tuples shouldn't be used for a streaming API, as a temporary representation of rows e.g. from one input data frame to an output data frame). But it's not completely unreasonable either for small data sets, e.g. if you want to keep a handful of tuples of summary statistics computed using Query.jl based on a large data frame. Tuples are a convenient lightweight structure for many applications. Should we refuse to support this because we're afraid that people would misuse it?

bramtayl · 2018-03-17T23:32:19Z

It seems what’s really misssing here is collect-like machinery that is able to take a generator which returns a tuple of values and populate a tuple of vectors?

vtjnash · 2018-03-17T23:41:52Z

small data sets

Simple types like Any will compile faster, so you'll get the answer faster, if you avoid specializing the code. Of course, if you don't have much data, basically anything will perform OK: Dict or even just Array might also perform just fine.

Could you explain this in detail for non-compiler developers like us?

Someone else is welcome to explore this, and optimize some of these cases during v1.x. But as the guy who said "we can have fast small unions" (and did so, mostly), I'm also acutely aware that it works best in cases where the union is of completely unrelated items (like Tuple and Nothing), and that the work does not generalize to container constructors (or any other large union, such as is represented by Tuple{Union...}). That would be a much more difficult problem. Afaik, supporting that would require a static type system. (Optimal performance currently depends heavily on creating all 2^n specializations – although as Jeff notes in https://discourse.julialang.org/t/missing-data-and-namedtuple-compatibility/8136/34 we won't actually need them – so we just instead make sure that inference won't do that work or compute a return type and instead defer that analysis for runtime).

bramtayl · 2018-03-18T13:01:37Z

Ok, closing this for now; I've found a workaround #25924 (comment) and duplicate of #25553

Update promotion.jl

1f9c03d

kshyatt added the types and dispatch Types, subtyping and method dispatch label May 27, 2018

bramtayl closed this Jul 6, 2018

bramtayl mentioned this pull request Jul 11, 2019

Base.unzip() #13942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make promote_typejoin recursive #26501

Make promote_typejoin recursive #26501

bramtayl commented Mar 17, 2018

bramtayl commented Mar 17, 2018 •

edited

Loading

nalimilan commented Mar 17, 2018

bramtayl commented Mar 17, 2018

bramtayl commented Mar 17, 2018

vtjnash commented Mar 17, 2018

bramtayl commented Mar 17, 2018

nalimilan commented Mar 17, 2018 •

edited

Loading

bramtayl commented Mar 17, 2018

vtjnash commented Mar 17, 2018

bramtayl commented Mar 18, 2018

Make promote_typejoin recursive #26501

Make promote_typejoin recursive #26501

Conversation

bramtayl commented Mar 17, 2018

bramtayl commented Mar 17, 2018 • edited Loading

nalimilan commented Mar 17, 2018

bramtayl commented Mar 17, 2018

bramtayl commented Mar 17, 2018

vtjnash commented Mar 17, 2018

bramtayl commented Mar 17, 2018

nalimilan commented Mar 17, 2018 • edited Loading

bramtayl commented Mar 17, 2018

vtjnash commented Mar 17, 2018

bramtayl commented Mar 18, 2018

bramtayl commented Mar 17, 2018 •

edited

Loading

nalimilan commented Mar 17, 2018 •

edited

Loading