Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method for iterating metadata #51

Closed
Tokazama opened this issue Sep 23, 2022 · 8 comments · Fixed by #56
Closed

Add method for iterating metadata #51

Tokazama opened this issue Sep 23, 2022 · 8 comments · Fixed by #56

Comments

@Tokazama
Copy link

Any type that supports metadata will likely need a way to iterate over metadata for propagating metadata to new instances or merging metadata. The generic way of iterating through metadata right now is using accessing each key form the iterable returned by metadatakeys. Depending on how metadata is stored this may not be the most efficient way to accomplish this. If the method that is propagating metadata internally is managing a single collection of metadata then a unique implementation can be written for whatever method of iteration is best. However, when combining multiple sets of metadata it's not practical to have a unique method for every combination of metadata storage patterns.

A couple things to consider in supporting generic iteration over metadata:

  • How do we manage to including the style information. This may be as simple as having two methods such as metapairs and styledmetapairs.
  • Do we want a default for this method that functions like the current heuristic using metadatakeys?
@bkamins
Copy link
Member

bkamins commented Sep 23, 2022

This (and your earlier comment in the PR) raise the following issue (which I initially wanted to avoid to ensure simplicity).

We could add a method metadatadict(obj; style::Union{Symbol, Nothing}=nothing) that would be required to return an AbstractDictionary supporting its interface - and ensuring that all functions from this interface are fast. If style is not passed then all key-value pairs are returned. If style is passed then key-value pairs are filtered to only expose those whose style matches the style kwarg.

The reason why I have not added this requirement initially was that I was afraid that package maintainers would find it difficult to correctly implement this custom type and metadatadict function (AbstractDict interface is more complex than it seems on a surface and it is not properly documented). But maybe we should go for this and have metadatadict return nothing by default so that packages can easily detect that some obj does not have this method implemented.

The alternative, as you propose, would be to add metadatapairs function (on which we would need to think in detail how to specify its API, but I think it is doable).

(the same for column-level metadata, but let us leave this discussion for later)

CC @nalimilan

@Tokazama
Copy link
Author

The reason why I have not added this requirement initially was that I was afraid that package maintainers would find it difficult to correctly implement this custom type and metadatadict function (AbstractDict interface is more complex than it seems on a surface and it is not properly documented). But maybe we should go for this and have metadatadict return nothing by default so that packages can easily detect that some obj does not have this method implemented.

It's definitely tempting to have a way of directly interacting with an AbstractDict type so we don't have to ever implement something that could just fall back to a dictionary. It's also nice to have a fairly clear idea of how an object behaves when trying to write quick code.

On the other hand, I have similar apprehensions about the cost of an increasingly strict interface at this point. I recall some trial and error to get interfaces like AbstractDict to work when first learning Julia. Perhaps we could have an extra package that provides a handful of useful types that support the dictionary interface that also support features we require for the metadata interface.

@bkamins
Copy link
Member

bkamins commented Sep 23, 2022

Let us wait for what @nalimilan thinks

@nalimilan
Copy link
Member

Given that we have metadatakeys, having metadatapairs would be kind of logical.

metadatadict is a more tricky issue. If not all packages implement it, how useful would it be in practice? Would we need a fallback that allocates a new Dict? But in that case mutating it wouldn't affect the original object.

Perhaps we could have an extra package that provides a handful of useful types that support the dictionary interface that also support features we require for the metadata interface.

What would this provide compared with simply using Dict (which is what DataFrames.jl does for example)?

@bkamins
Copy link
Member

bkamins commented Sep 23, 2022

Would we need a fallback that allocates a new Dict?

It could be a dynamic dict being a view, but I agree - as commented above - that it is tricky.

In summary - do we think adding metadatapairs makes sense now, or we add it at some later point when someone needs it?

@nalimilan
Copy link
Member

As usual, I'd rather wait until somebody really needs it. ;-)

@nathanrboyer
Copy link

nathanrboyer commented Nov 2, 2022

I couldn't follow all the discussion in #53, but it would make sense to me to just have metadata(df) dump all the pairs (rather than a new function). I am currently doing this which is a pain:

julia> Dict(key => metadata(df, key) for key in metadatakeys(df))
Dict{String, Float64} with 2 entries:
  "RMSE" => 0.10915
  "t₀"   => 3.974

@bkamins
Copy link
Member

bkamins commented Nov 2, 2022

and the same for colmetadata(df, col) and colmetadata(df)?

CC @nalimilan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants