You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from kmodes.kprototypes import KPrototypes
import pandas as pd
df = pd.read_csv("train.csv", usecols=['Sex', 'Age', 'Embarked'])
model = KPrototypes(n_clusters=2)
clusters = model.fit_predict(df)
Gives NotImplementedError: No categorical data selected, effectively doing k-means. Present a list of categorical columns, or use scikit-learn's KMeans instead.
The Sex and Embarked variables are categorical. Doing something like df["Sex"] = df["Sex"].astype('category') gives the same result. KModes has no problems with the same data. Am I doing something wrong here and this is expected behaviour, or is something up?
The text was updated successfully, but these errors were encountered:
Ah figured it out where I had been going wrong by looking at the source code. For anyone else's benefit who finds this via google, it needs a categorical argument with the index of the columns you need to use.
The last line above should be changed to clusters = model.fit_predict(df.values, categorical=[0,2]) for the example dataset.
None that .values was added to df due to encountering #40
In the example below to reproduce I'm using the titanic dataset from https://www.kaggle.com/c/titanic/data
Gives
NotImplementedError: No categorical data selected, effectively doing k-means. Present a list of categorical columns, or use scikit-learn's KMeans instead.
The Sex and Embarked variables are categorical. Doing something like
df["Sex"] = df["Sex"].astype('category')
gives the same result. KModes has no problems with the same data. Am I doing something wrong here and this is expected behaviour, or is something up?The text was updated successfully, but these errors were encountered: