-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very Slow with greater than 10k obs #45
Comments
This is not due to the no. of observations, but likely due to your specific data. Note that Cao's init method can be very slow when n_clusters is large, maybe that's it. You can run the benchmark.py script in the examples directory (which has 10k observations) to see if that works for you. Also, you're leaving me guessing with this little information about the specifics of your problem... |
Same here. I was able to run DBSCAN, but I'm struggling to run k-Prototype. Not sure about what strategy to follow to be able to run it. |
@paulaceccon , what are the dimensions of your problem? Also, please provide sample output, running in verbose mode. |
Any insight as to why kprototypes is a lot slower than kmodes for similar datasets? Just trying to understand the algorithm better. I've got only 3 numerical variables in my dataset(out of 12 total) and if I get rid of one and turn the other two into categories, kmodes runs instantaneously for n=1-10 |
I've had some time to analyze this problem by profiling my code. In order to determine the clusters in the k-means part of the algorithm, we need to divide the sums of attribute values by the number of points in the cluster. I realized I was caching the sums of the attributes alright, but not the sums of the memberships in the clusters. The following commit resolves this: I'm seeing very significant speedups as a result. :) Thanks to everyone for pointing it out. |
Oh, and I've included a benchmark script specifically for k-prototypes in the examples folder. |
It does not go ahead of this...
Init: initializing centroids
Init: initializing clusters
Starting iterations...
Even in gcs instance
The text was updated successfully, but these errors were encountered: