Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force dtype np.float64 for optimization while data dtype is np.float32 #843

Open
mlondschien opened this issue Sep 20, 2024 · 1 comment

Comments

@mlondschien
Copy link
Contributor

We're having quite a few problems with optimization in float32. In small batches, these go away if we .astype(np.float64) our data before calling glum.GeneralizedLinearRegressor.fit. This also makes the algorithm much faster for some reason. However, we cannot afford the float32 -> float64 conversion on the entire dataset due to memory constraints.

Is there an option to do the optimization in glum (i.e., probably, coef and the current hessian estimate) in float64 even if the data itself is float32?

@stanmart
Copy link
Collaborator

stanmart commented Sep 20, 2024

I don't think tabmat can handle a mismatch in dtypes. For example,

import tabmat as tm
import numpy as np

X = tm.DenseMatrix(
    np.random.rand(1000, 10).astype(np.float32),
)
d = np.random.rand(1000).astype(np.float64)

X.sandwich(d)

fails with

[...]
File src/tabmat/ext/dense.pyx:29, in tabmat.ext.dense.dense_sandwich()
ValueError: Buffer dtype mismatch, expected 'double' but got 'float'

So while it would be a nice feature, I don't think it also requires non-trivial changes in tabmat -- in particular, the parts written in C++.


Edit: fix dimensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants