Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size must be factor or total dataset size #21

Closed
MacDaddio opened this issue Jun 15, 2022 · 2 comments · Fixed by #22
Closed

Batch size must be factor or total dataset size #21

MacDaddio opened this issue Jun 15, 2022 · 2 comments · Fixed by #22

Comments

@MacDaddio
Copy link

The mini-batch part of this repository works great! However, when the batch size is not a factor or the total dataset size, the code throws an error. Is there anyway to make it so that any batch size can be used? Below is a minimal working example of what I am talking about. Essentially, if batch_size = 1000 then everything works fine and the mini-batch procedure seems to work with all 10 batches. However, when batch_size = 999, the last batch (of size 10) causes an error. Thanks!

from pycave.bayes import GaussianMixture
import torch

#Set seed
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

#Inputs
n = 10000
p = 200
k = 5
batch_size = 999 #1000

#Make some non-Gaussian data
X = torch.randn(n,p)

#Fit PyCave GMM
gmm = GaussianMixture(num_components=k,
covariance_type='full',
init_strategy='kmeans++',
batch_size=batch_size,
trainer_params={'gpus':1,'enable_progress_bar':False},
covariance_regularization=1e-3)
gmm = gmm.fit(X)

@borchero
Copy link
Owner

Thanks for the example code, I’ll have a look later!

@borchero
Copy link
Owner

Thanks a lot for this issue! It's an incredibly easy fix but has pretty big implications for mini-batch K-Means++ 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants