Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embarrassingly parallel linear regression over 1MM datasets #136

Open
hrishikeshvganu opened this issue Dec 1, 2016 · 0 comments
Open

Comments

@hrishikeshvganu
Copy link

Hi,
I have a use case where I need to build a regression model for demand of each product for a retailer. The number of products is > 5million. I plan to use a linear model for each product but the parameters of the model are allowed to be different for each product.

This is a computation where there's a set of {data, model} for each product and there are > 1MM such sets. Since the data at a product level is small ( around 1000 instances) I was thinking of using a miniBatch size of 1000 and train in a loop over the products.

Is there a better approach/built-in functionality that BidMach provides for such embarrassingly parallel tasks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant