Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_metadata has bizzare behaviour in python 3.3 #9317

Closed
Skorpeo opened this issue Jan 20, 2015 · 10 comments
Closed

_metadata has bizzare behaviour in python 3.3 #9317

Skorpeo opened this issue Jan 20, 2015 · 10 comments

Comments

@Skorpeo
Copy link

Skorpeo commented Jan 20, 2015

See this post: http://stackoverflow.com/questions/28041762/pandas-metadata-of-dataframe-persistence-error?noredirect=1#comment44479369_28041762

Users using python 3.3 are reporting that when "non _metadata" attributes are copied they propagate to the new copied dataframe. Furtheremore, when persisting to json they are able to access _metadata and non _metadata attributes from the file when loading via read_json method even though the json file has no contents which leads me to believe data is floating around somewhere. I can't replicate this on my system as I use python 2.7.5 my _metadata behaviour is consistent with my expectations (see the post). Assuming there are no other issues I can think of easy ways to write helper functions for json and hdf5 to persist the _metadata however now it appears to me that there may be other things to look into. This is my first ever bug report, am I a real programmer now?

@shoyer
Copy link
Member

shoyer commented Jan 20, 2015

Responded on StackOverflow, but in brief, _metadata is not part of the public API so there isn't any bug here unless you can reproduce it without using private methods.

@Skorpeo
Copy link
Author

Skorpeo commented Jan 21, 2015

I responded on Stackoverflow (keep in mind I can't reproduce EdChum's behaviour on my system):
"You are right with respect to the _metadata piece. In this case the behaviour of df.badmeta that EdChum reports above is very strange and it does not use _metadata. I am not sure where the values are coming from or where they are stored, being a novice I would be worried that somehow the class is being appended to since how else can he retrieve the values of badmeta? It should not exist since he is loading a empty json but yet the dataframe has an attribute that is not in the file. A artifact of some sort..."

@kay1793
Copy link

kay1793 commented Jan 21, 2015

_metadata is not part of the public API so there isn't any bug here

are you sure? #8572.

@shoyer
Copy link
Member

shoyer commented Jan 21, 2015

@kay1793 I think we would like have some sort of public metadata documentation/API but it's clearly not there yet. It's completely undocumented and requires using private methods. But contributions would be very welcome!

@kay1793
Copy link

kay1793 commented Jan 21, 2015

It looked to me as if that issue (+ SO reply connected) are about documenting what exists already, so experimental maybe but not "private". Not clear at all what the status is and I think "no bug could be here" is too strong.

@TomAugspurger
Copy link
Contributor

So I think we're all agreed that it would be nice to have well defined and documented (and bug free!). No use debating whether something that's partially implemented can be buggy! (if a tree falls in a forest...)

@kay1793
Copy link

kay1793 commented Jan 22, 2015

first it was private and now it's partially implemented. your bound to run out of labels sooner or later :)
maybe you want to add to it, maybe it hasn't been documented yet but this wasn't added "by mistake" right? it's there as basic support for metadata. so it's not impossible that it have bugs.

if it works on 2.7 but not on py3, that looks like a proper bug report to me. they are not asking for new features just that what exists should work. calling it names instead is not useful.

@Skorpeo
Copy link
Author

Skorpeo commented Jan 22, 2015

I personally am more interested in understanding the sorcery behind where the value of the attributes is being stored.

Just for fun and to indulge in the conversation above:

In this SO post: link

A gentleman by the name of "Jeff" wrote the following:

"I think something like this will work (and if not, pls file a bug report as this, while supported is a bit bleading edge, iow it IS possible that the join methods don't call this all the time. That is a bit untested)"

Anyway, personally my in-elegant solution around the metadata issue is to just have a column (i.e. Series) in a dataframe that is the metadata and I use dropna() when retrieving its values.

@shoyer
Copy link
Member

shoyer commented Jan 22, 2015

@Skorp1 FYI "Jeff" on SO is @jreback (lead pandas maintainer)

@jreback
Copy link
Contributor

jreback commented Jan 22, 2015

@Skorp1 the _metadata attribute is not propogated explicity when serializing. I recall some discussions about this in the past, but it is not implemented for JSON or HDF. You can do your own if you want.

This is an implementation detail. The reason for documenting this is to enable people who wish to sub-class and/or implement something on top of it. It is not for the casual user.

@jreback jreback closed this as completed Jan 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants