Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flatten to DataAPI.jl #39

Open
bkamins opened this issue Jul 4, 2021 · 6 comments
Open

Add flatten to DataAPI.jl #39

bkamins opened this issue Jul 4, 2021 · 6 comments

Comments

@bkamins
Copy link
Member

bkamins commented Jul 4, 2021

Both SplitApplyCombine.jl and DataFrames.jl export flatten. I would add it to DataAPI.jl. The question is what docstring it should have? Maybe something like:

Flatten collection of collections into a single collection

Is enough?

@andyferris - after this is established maybe you could add DataAPI.jl to SplitApplyCombine.jl as a dependency and make innerjoin and flatten implement this interface? Then SplitApplyCombine.jl and DataFrames.jl could be used together more easily.

@andyferris
Copy link
Member

Cool - this is an interesting package. I can see how this could remove friction for users.

Just a thought - for functions that are widely useful, are present in other languages standard libraries, and have an unambiguous definition for Vector (like mapmany and flatten) could we first attempt to put them into Base?

@bkamins
Copy link
Member Author

bkamins commented Jul 5, 2021

I think it might be possible, but I think that adding things to Julia Base has been recently restricted a lot. Also, even if we added them, they would most likely not go into the next Julia LTS, which means that we would wait several years to be sure everyone has it in Julia Base. I think the benefit of DataAPI.jl is that it allows for a much quicker development cycle as we would only add e.g.:

function flatten end

here so we do not have to promise any specific API (except for specification of a general meaning of the function).

@nalimilan
Copy link
Member

Am I right that SplitApplyCombine.flatten(x) is equivalent to collect(Iterators.flatten(x))?

DataFrames.flatten(df, cols) is a bit different I would say. In particular, if we consider data frames as collections of rows, SplitApplyCombine.flatten(df) should return a (flat) collection of all cells in df. The flatten(df, cols) method doesn't fit very well in that approach -- though it's not incompatible either.

@bkamins
Copy link
Member Author

bkamins commented Jul 24, 2021

Indeed. The issue is to avoid name clashes when both DataFrames.jl and SplitApplyCombine.jl are both loaded in a session (which is relatively common for advanced usage scenarios). What would you do in such a case?

@andyferris
Copy link
Member

That sounds right.

I assumed you used flatten(gdf) for nested (grouped) data frames?

The cols version “feels” like to me a lot some flavour of a SpltApplyCombine.mapmany call which is what flatten is ultimately defined as. You are automatically keeping (broadcasting?) the columns which aren’t mentioned, right?

@bkamins
Copy link
Member Author

bkamins commented Jul 24, 2021

I assumed you used flatten(gdf) for nested (grouped) data frames?

It is just DataFrame constructor

You are automatically keeping (broadcasting?) the columns which aren’t mentioned, right?

Right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants