-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict
bug with mismatched data types
#731
Comments
Thanks for reporting this Ivan @ivanhigueram! I have to think about this one for a bit. My intuition would be that keys of 10.0 and 10 should be treated equally? For sure I think adding a warning would be a good start! |
If you had to choose, what would be your preferred behavior? |
I'd say a warning would be enough. Is definitely easier to do a |
Yes I was also afraid that handling things in pyfixest would be a lot of work 😅 I'll add a warning then (or would you be up to open a PR? but of course no pressure 😄 ). I think it could be as simple as adding if self._has_fixef:
fixef = self._fixef.split("+")
mismatched_fixef_types = [x for x in fixef if newdata[x].dtypes != self._data[x].dtypes]
if mismatched_fixef_types:
warnings.warn(f"Data types of fixed effects {mismatched_fixef_types} do not match the model data. This leads to mismatched keys in the fixed effect dictionary, and as a result, to NaN predictions for columns with mismatched keys.") around here pyfixest/pyfixest/estimation/feols_.py Line 1811 in 611e5cc and to add a test that triggers the warning in test_errors.py. |
I'd love to open the PR. I will ping if I go into any problems with testing and all that. |
Hello there,
I am trying to create predictions with a dataset outside my model data. I found that if there's any type mismatch in the
newdata
compared with thedata
we used to estimate the model, thepredict()
method will return an array ofnan
:Here's a reproducible example:
This error is coming from the definition of
df_fe
in thepredict()
and the_apply_fixef_numpy
as the dictionary keys will be saved with a10.0
rather than10
.pyfixest/pyfixest/estimation/feols_.py
Line 1832 in 10a8fb8
Not sure if this constitutes a bug on itself, or if this is a skill issue, but it would be nice to get a warning maybe? This is no problem for unit FEs if the ID is a
str
, but in dates, sometimes we get2010.0
rather than2010
when we do time operations and numpy preserves the data type.I am running the
'0.26.2'
version in Python3.10
.The text was updated successfully, but these errors were encountered: