KProtoypes wrongly identifies categorical data as non-categorical #71

mikeyford · 2018-05-11T14:52:04Z

In the example below to reproduce I'm using the titanic dataset from https://www.kaggle.com/c/titanic/data

from kmodes.kprototypes import KPrototypes
import pandas as pd

df = pd.read_csv("train.csv", usecols=['Sex', 'Age', 'Embarked'])

model = KPrototypes(n_clusters=2)
clusters = model.fit_predict(df)

Gives NotImplementedError: No categorical data selected, effectively doing k-means. Present a list of categorical columns, or use scikit-learn's KMeans instead.

The Sex and Embarked variables are categorical. Doing something like df["Sex"] = df["Sex"].astype('category') gives the same result. KModes has no problems with the same data. Am I doing something wrong here and this is expected behaviour, or is something up?

The text was updated successfully, but these errors were encountered:

mikeyford · 2018-05-11T15:42:11Z

Ah figured it out where I had been going wrong by looking at the source code. For anyone else's benefit who finds this via google, it needs a categorical argument with the index of the columns you need to use.
The last line above should be changed to
clusters = model.fit_predict(df.values, categorical=[0,2]) for the example dataset.

None that .values was added to df due to encountering #40

mikeyford closed this as completed May 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KProtoypes wrongly identifies categorical data as non-categorical #71

KProtoypes wrongly identifies categorical data as non-categorical #71

mikeyford commented May 11, 2018 •

edited

Loading

mikeyford commented May 11, 2018 •

edited

Loading

KProtoypes wrongly identifies categorical data as non-categorical #71

KProtoypes wrongly identifies categorical data as non-categorical #71

Comments

mikeyford commented May 11, 2018 • edited Loading

mikeyford commented May 11, 2018 • edited Loading

mikeyford commented May 11, 2018 •

edited

Loading

mikeyford commented May 11, 2018 •

edited

Loading