Batch size must be factor or total dataset size #21

MacDaddio · 2022-06-15T18:16:31Z

The mini-batch part of this repository works great! However, when the batch size is not a factor or the total dataset size, the code throws an error. Is there anyway to make it so that any batch size can be used? Below is a minimal working example of what I am talking about. Essentially, if batch_size = 1000 then everything works fine and the mini-batch procedure seems to work with all 10 batches. However, when batch_size = 999, the last batch (of size 10) causes an error. Thanks!

from pycave.bayes import GaussianMixture
import torch

#Set seed
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

#Inputs
n = 10000
p = 200
k = 5
batch_size = 999 #1000

#Make some non-Gaussian data
X = torch.randn(n,p)

#Fit PyCave GMM
gmm = GaussianMixture(num_components=k,
covariance_type='full',
init_strategy='kmeans++',
batch_size=batch_size,
trainer_params={'gpus':1,'enable_progress_bar':False},
covariance_regularization=1e-3)
gmm = gmm.fit(X)

borchero · 2022-06-15T19:09:02Z

Thanks for the example code, I’ll have a look later!

borchero · 2022-06-17T00:31:13Z

Thanks a lot for this issue! It's an incredibly easy fix but has pretty big implications for mini-batch K-Means++ 😄

borchero mentioned this issue Jun 17, 2022

Fix k-means++ mini-batch initialization #22

Merged

borchero closed this as completed in #22 Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size must be factor or total dataset size #21

Batch size must be factor or total dataset size #21

MacDaddio commented Jun 15, 2022

borchero commented Jun 15, 2022

borchero commented Jun 17, 2022

Batch size must be factor or total dataset size #21

Batch size must be factor or total dataset size #21

Comments

MacDaddio commented Jun 15, 2022

borchero commented Jun 15, 2022

borchero commented Jun 17, 2022