-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does not handle NaN #218
Comments
It wasn't on my radar to have support for the NAs/missings. Typically, inputing would work (mean/media, or min/max), or otherwise creation of indicator variable [0, 1] if either missing or not. Any reason why such method wouldn't be applicable to your situation? |
Imputing wouldn’t work. I have NaNs or missing in instances where I don’t have enough data to calculate features. For instance, the first few features would be NaN or missing. I am afraid I am not aware of indicator variables - can you please elaborate? My features are continuous values I hope we can come up with a solution for this, I am quite impressed with the performance, am looking forward to using this package Thanks! |
By indicator variable, I mean something similar to: julia> df = DataFrame(v1 = [missing, rand(3)...])
4×1 DataFrame
Row │ v1
│ Float64?
─────┼─────────────────
1 │ missing
2 │ 0.777368
3 │ 0.0461273
4 │ 0.71682
julia> transform!(df, "v1" => ByRow(ismissing) => "v1_flag")
4×2 DataFrame
Row │ v1 v1_flag
│ Float64? Bool
─────┼──────────────────────────
1 │ missing true
2 │ 0.777368 false
3 │ 0.0461273 false
4 │ 0.71682 false
Then, you need to make an imputation for the original variable, could be 0, mean, or other relevant value: julia> transform!(df, "v1" => ByRow(x -> ismissing(x) ? 0.0 : x) => "v1")
4×2 DataFrame
Row │ v1 v1_flag
│ Float64 Bool
─────┼────────────────────
1 │ 0.0 true
2 │ 0.777368 false
3 │ 0.0461273 false
4 │ 0.71682 false Such kind of approach should typically cover most use cases. |
ahh, I understand. thanks for this. I will give this a shot, although, I will have to think more about the replacement values. I am working with Financial data, so, 0s are valid values. Imputing data with mean, median etc, would be misleading. thanks |
Hi
I am trying out EvoTrees for Binary classification task as below. it turns out, however, that it does not support NaNs. is there a reason why to doesn't? I currently XGBoost for my model and it handles NaNs. the primary issue is that I have missing data in my dataset and some of the features are NaNs for some values (correctly so).
Is there a plan for EvoTrees to handle NaNs please? I am also curious as to how others handle NaNs? Imputing or mean/median? those approaches wont work for me, I am afraid. anything else I can try out please?
ta!
The text was updated successfully, but these errors were encountered: