You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think this is a bug, I'm just creating this issue so that we can observe a different behaviour I just saw.
Here is a series where the outlier detection is slightly off. This is an integer series comprised of {0, 80, 100}. Here, the values are actually categorical because they just distinguish states, and the choice of the state values 0 and 80 is arbitrary (it could have been 0, 1, 2, but for some reason it's 0, 80, 100).
Unless we tune the heuristic a little bit to avoid counting outliers when the cardinality is "very low", I don't see an easy improvement.
thanks @Vincent-Maladiere . I had considered turning off outlier detection when there are few unique values but actually having a wide range of values is still a problem if you have few. imagine you had values 1, 2, 3, and -1000 say to indicate some invalid value. if you don't cut the axis due to the -1000 1, 2, and 3 will all get squished together and you will lose the information.
what we would like in this case is to realize that the actual values don't matter and treat the variable as categorical as you say. but I'm not sure that its' reasonable to assume that is the case whenever there are few unique values 🤔
Yes, that's tricky I agree. The best scenario would be for the user to notice that and convert to string or category dtypes. The main difference between my screenshot and yours is that on mine the outlier segment is bigger than the category displayed on the left, so the outliers are not really outliers, if that makes sense?
what we would like in this case is to realize that the actual values don't matter and treat the variable as categorical as you say. but I'm not sure that its' reasonable to assume that is the case whenever there are few unique values 🤔
Let's wait a bit to get feedback and decide about this.
Describe the bug
I don't think this is a bug, I'm just creating this issue so that we can observe a different behaviour I just saw.
Here is a series where the outlier detection is slightly off. This is an integer series comprised of {0, 80, 100}. Here, the values are actually categorical because they just distinguish states, and the choice of the state values 0 and 80 is arbitrary (it could have been 0, 1, 2, but for some reason it's 0, 80, 100).
Unless we tune the heuristic a little bit to avoid counting outliers when the cardinality is "very low", I don't see an easy improvement.
WDYT?
The column dataset:
state.csv
Steps/Code to Reproduce
Expected Results
No outliers
Actual Results
Some outliers
Versions
The text was updated successfully, but these errors were encountered: