-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add vcat with source; deprecate indicator in joins in favor of source #2649
Conversation
CI error is unrelated and is fixed in #2648 |
Should this be called |
Co-authored-by: Eric Hanson <[email protected]>
I am OK with |
Consistency with joins sounds like a good reason to use |
dfs′ = Vector{AbstractDataFrame}(undef, len) | ||
for (i, (v, df)) in enumerate(zip(vals, dfs)) | ||
dfs′[i] = insertcols!(copy(df, copycols=false), 1, col => Ref(v)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a big deal, but wouldn't it be more efficient (and not really more complex) to create the column only after concatenating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a design decision related with an empty data frame. As you can see in tests and examples, if you pass DataFrame()
to vcat
now a single row is created with all missing
values except indicator column. Otherwise we would drop such a data frame.
But maybe dropping it is preferable. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought about it and concluded that it is better to drop it. I will change the implementation.
Regarding the API, have you considered something like |
I analyzed it before making the PR. We could allow:
but I am hesitant to add it as currently:
(and this makes sense and is expected) So adding a kwarg would significantly affect the produced result, which I think is not desirable. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Not doing this in
Which would all have to be produced, which I thought would be more confusing than clarifying. Similarly for two columns passed we have three values (left, right, and both) so it is not the same. But of course the function of the kwarg is similar (clearly indicate the source data frame) so I would not oppose to syncing it between Given these considerations what do you think. Should we make them consistent? |
and I will change the kwarg do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I kind of wish we would have used source
instead of indicator
for joins, as that sounds more explicit, but well...
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Same with me. Maybe we can:
|
I have no recollection. |
Sorry for pinging then. Do you have an opinion though 😄? |
I also don't recall this discussion. I'm also in favour of |
My only comment is that |
OK. I will change it. (it is the place where it goes in joins) |
OK - I have traced the original issue here #1412. Thank you all for responding. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Pair{<:SymbolOrString, <:AbstractVector}}=nothing) = | ||
reduce(vcat, dfs; cols=cols, source=source) | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan - I have added a docstring for reduce
to make this option more discoverable. Could you please have a look at it before I merge? Thank you!
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Thank you! |
Fixes #659
I was prompted by https://discourse.julialang.org/t/would-it-help-to-have-a-tool-that-automatically-determines-which-issues-should-be-closed/56274/16 :).
The PR is relatively simple so we can add it in 1.0 release I think.