-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.eval division operation upcasts float32 to float64 #12388
Comments
(this is different that what you are saying, but should prob handle non-the-less). I would do this test/casting in |
is much easier to write than:
Also, if someone that didn't read the Therefore, should we mimic As for |
Thought about it some more We could look at the whole expression, and come up with an output datatype:
This still has corner cases like adding two int32 arrays will result in float64. It is unclear what the solution of adding two int32 arrays should be: If the numbers are small, then int32 array as an output array is OK, but if the numbers are big you need int64 arrays. A way around this would be to let the user specify an So, the proposal now becomes:
|
I don't recall why we are casting in the first place. I would ideally like to defer this entirey to the engine. if not, then would be ok with passing a |
Should we go with So, what we'd have to do here is to down-cast constants from float64 to float32, if and only if all arrays are float32s. E.g., e.g.
|
I think you have to upcast by default, the only way I wouldn't would be if the users indicated (with |
but wouldn't this result in inconsistent behavior between normal pandas binary operations (like |
@jennolsen84 hmm. that is a good point. just trying to avoid pandas do any casting here. What if we remove that and just let the engine do it? (I don't really recall why this is special cased here). Or if we are forced to do it, then I guess you are right would have to do a lowest-common denonimator cast (maybe use |
how about this as a start? jennolsen84@c82819f I manually tested it, and the behavior is now consistent with non-numexpr related code. I am trying to avoid casting un-necessarily as you recommended, and letting the lower-level libraries take care of a lot of things. I did run the nosetests, and they all pass on existing tests. If the commit looks good to you, I can add in some tests, add to docs, etc. and submit a PR. |
@jreback can you please take another look at the commit? I addressed your comment, and I am not sure if you missed it. |
@jennolsen84 yeh just getting back to this. your soln seems fine. However I still don't understand why it is necessary to upcast (and only for division); what does numexpr do (if you don't upcast)? is it wrong? |
We're casting to float32 in all ops (not just division). The division thing was another case where The reason why the cast happens at all is for some reason I will submit a PR (with whatsnew and tests) |
thanks @jennolsen84 why don't you submit and we'll go from there |
The current behavior is inconsistent with normal python division of two
DataFrame
s (see code sample).Pandas upcasts both terms to 64-bit floats when it detects a division, see:
https://github.com/pydata/pandas/blob/528108bba4104b939bcfe6923677ddacc916ff00/pandas/computation/ops.py#L453
I think numexpr can handle different types too, and upcast automatically, though I am not 100% sure. I can submit a PR, but how do you recommend fixing this? Something like the following?
The downside is that if someone does
2 + df
, they'll probably still end up upcasting it. But this proposal is still better than what we have todayI might re-write the above using
filter
too, but at this time I just wanted to discuss the general approachCode Sample, a copy-pastable example if possible
Expected Output
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: