-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improper types when using new DataFrames #30
Comments
This will be due to the fact that your first 2 columns don't have any null values, so when they get deserialized, they will be regular |
I see. It seems like a problem that deserialized dataframes have significantly different behavior from the serialized ones they result from. It seems to me that the responsibility falls to the |
I don't really like the idea that the type changes depending on whether there are null values or not in the data (there's the same issue in RCall). Since Feather only support nullable arrays, maybe the default should be to always return a |
This is definitely a big issue. Currently, however, this issue is sort of baked into I think what needs to happen is that there need to be two different types down graph from Also, I'm wondering under just what circumstances |
Ok, so the new behavior (as of 2135bf6) is that all columns will be returned as |
That seems ideal. I think we need something analogous for converting the |
Can you give me cases where you ran into problems with WeakRefStrings? I'd definitely be interested to hear about cases where they're causing problems. |
I don't have a record of the exact cases when I encountered an issue to be honest, if I encounter more I'll keep track of them and let you know, but frankly I've been converting them for a while so I haven't seen more. As I recall, there weren't any problems that seemed like bugs to me, so I'm not sure there'd be anything to actually fix, it was more of an interface issue. If I think of anything really specific I'll let you know. |
I just ran into this problem. If you do a join with two DataFrames on a column where in one DataFrame it is a column of I guess your solution to this is to use WeakRefStrings in the second DataFrame. |
This issue is actually a really significant performance barrier. Deserialization into normal strings is much faster in the Python implementation as things are right now, that seems like a problem. Why should that be the case, does it have something to do with the way Julia strings work? Having a different string type for this is definitely not ideal, there is always the potential for them to work differently than |
Is the problem with deserialization at the time of reading the file, or at any time? Is it possible to deserialize the file into WeakRefStrings and then later bulk convert them to regular strings faster than the time it takes to read from the file into regular strings? Or is this a problem with the performance of regular strings in general? |
I think the use of
There are no |
It seems likely to me that since CPython is written in C, internally Python strings are null-terminated. If that's the case it means that what @dmbates told us shouldn't mean that the Julia deserialization of strings is necessarily slower. |
No, that's incorrect. The use of WeakRefStrings is purely a performance optimization. Currently, a Julia
And we really need to squash the myth that's been going around that Julia |
@quinnj Okay. I was wrong. |
That makes it sound like there's really no choice but to wait for Julia to improve its Anyway, do you know if the |
Yes, we'd need JuliaLang/julia#18632 to be cleaned up/merged (probably non-trivial). Ultimately, we'd also want JuliaLang/julia#12447 to ensure that the Julia |
When using
DataFrames
withNullableArrays
(currently their master) aDataFrame
that is saved an reloaded withFeather
has the wrong eltypes.For example, the dataframe with these eltypes
becomes a dataframe with these eltypes
So, as far as I can tell the only problems are that all types wind up not being in NullableArrays except for strings, which have the ("wrong") type
WeakRefString{UInt8}
.The text was updated successfully, but these errors were encountered: