-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make promote_typejoin recursive #26501
Conversation
So this hit a stackoverflow. But I think that I just revealed an existing bug in typejoin. There's a line that looks like this: function typejoin(a, b)
# some stuff
a′ === a ? typejoin(a, b) : recur(a′, b) If the |
Oh ok should have read closer before reinventing the wheel. |
Is there an ELI5 of why #25553 was reverted? Maybe I'm being dense. Is it worth implementing for a custom named-tuple like type? |
It's very slow (causes even worse performance than not having it), so it's misleading to suggest that you should structure code this way. It appears to me that, in general, for good performance you need to operate on data column-wise. Otherwise, trying to copy data by rows makes it too easy to run off the performance cliff and just have no way to recover. |
Well shoot if I knew that I should always work with data column-wise I would have stuck with R where everything is always vectorized in the first place. |
This statement mixes two different issues:
Regarding 1), I (and others apparently) haven't yet understood why it's the case, and (more importantly) why it's going to remain the case for the whole life of the 1.x series. Could you explain this in detail for non-compiler developers like us? Regarding 2), I agree that storing data as vectors of tuples is generally a bad idea compared with storing them as tuples of vectors (EDIT: which doesn't mean that tuples shouldn't be used for a streaming API, as a temporary representation of rows e.g. from one input data frame to an output data frame). But it's not completely unreasonable either for small data sets, e.g. if you want to keep a handful of tuples of summary statistics computed using Query.jl based on a large data frame. Tuples are a convenient lightweight structure for many applications. Should we refuse to support this because we're afraid that people would misuse it? |
It seems what’s really misssing here is collect-like machinery that is able to take a generator which returns a tuple of values and populate a tuple of vectors? |
Simple types like
Someone else is welcome to explore this, and optimize some of these cases during v1.x. But as the guy who said "we can have fast small unions" (and did so, mostly), I'm also acutely aware that it works best in cases where the union is of completely unrelated items (like |
Ok, closing this for now; I've found a workaround #25924 (comment) and duplicate of #25553 |
No description provided.