You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It took some time to understand that the reason was due to the X.head() and that in this case, it was making sense.
I'm wondering if you should avoid computing all the different values when one call X.head() instead of showing the statistics on few line. It can be misleading.
An alternative is to compute the statistics on the full dataset instead even if a user request to check the .head(). However if you call .head() it might be only because you are interested of seeing the couple of first line of the dataframe without checking any other statistics.
I did not think on how it is implemented. So it seems that the most reasonable solution is to avoid computing some of the statistics when the sample size is really small < 10?
not computing the associations under a certain sample size makes sense.
or we could also change the conditions under which we show the red "warning". The cramer V is an estimate of an effect size but it does not say anything about significance. by computing it we also get a chi-square statistic and thus a p-value. I wouldn't show the p-value to the user because it is not reliable, as the hypotheses of the test are not verified etc. but I guess we could still rely on it to decide if it is worth calling the user's attention to this pair of columns or not.
also, we may want to implement the bias correction of the cramer v statistic wikipedia
I got kind of surprise when I did the following display
It took some time to understand that the reason was due to the
X.head()
and that in this case, it was making sense.I'm wondering if you should avoid computing all the different values when one call
X.head()
instead of showing the statistics on few line. It can be misleading.An alternative is to compute the statistics on the full dataset instead even if a user request to check the
.head()
. However if you call.head()
it might be only because you are interested of seeing the couple of first line of the dataframe without checking any other statistics.@jeromedockes WDYT?
The text was updated successfully, but these errors were encountered: