Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dtype dependent precision #844

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mlondschien
Copy link
Contributor

xref #843

@jtilly
Copy link
Member

jtilly commented Sep 20, 2024

It would be very cool to have float32 support that "just works". I would expect that you will run into a couple more issues.

In 653d6f1 I'm now running the test suite on a float32 dataset. This actually looks pretty good, it's just that on the inference side, we're still expecting doubles in a lot of places.

================================================================================== short test summary info ===================================================================================
FAILED tests/glm/test_glm.py::test_solver_equivalence[float32-False-solver=lbfgs, alpha=1.0] - AssertionError: 
FAILED tests/glm/test_glm.py::test_solver_equivalence[float32-True-solver=lbfgs, alpha=1.0] - AssertionError: 
FAILED tests/glm/test_glm.py::test_alpha_search[float32] - AssertionError: 
FAILED tests/glm/test_glm.py::test_ols_std_errors[float32] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_sparse_std_errors[float32] - TypeError: self and d need to be of same dtype, either np.float64
FAILED tests/glm/test_glm.py::test_inputtype_std_errors[float32-False-False-False] - AssertionError: 
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-single] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-multiple_vars] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-multiple_constraints] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-rhs_not_zero] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-single] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-multiple_vars] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-multiple_constraints] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-rhs_not_zero] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_information_criteria_raises_correct_warnings_and_errors[float32] - RuntimeWarning: overflow encountered in cast
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_errors[float32] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
======================================================================= 70 failed, 1220 passed, 342 warnings in 17.42s =======================================================================

@stanmart
Copy link
Collaborator

This is an example fix for one of the mistakes causing the errors on @jtilly's branch.

--- a/src/glum/_glm.py
+++ b/src/glum/_glm.py
@@ -2128,7 +2128,7 @@ class GeneralizedLinearRegressorBase(BaseEstimator, RegressorMixin):
             )

         if (
-            np.linalg.cond(_safe_toarray(X.sandwich(np.ones(X.shape[0]))))
+            np.linalg.cond(_safe_toarray(X.sandwich(np.ones(X.shape[0], dtype=X.dtype))))
             > 1 / sys.float_info.epsilon**2
         ):
             raise np.linalg.LinAlgError(

There are a bunch of similar ones in the functions used for calculating the covariance matrix.

@mlondschien
Copy link
Contributor Author

mlondschien commented Sep 20, 2024

This actually looks pretty good, it's just that on the inference side, we're still expecting doubles in a lot of places.

I think there are also quite some "Kinderkrankheiten" that are not covered by the tests. E.g., if run on "real data", enet_coordinate_descent_gram sometimes returns coef with +- inf's, resulting in an error further down. So far, I wasn't able to reproduce this with simulated data. Also, we see a lot of

 ConvergenceWarning: Line search failed. Next iteration will be very close to current iteration. Might result in more convergence issues.

and

 ConvergenceWarning: Coordinate descent did not converge. You might want to increase the number of iterations. Minimum norm subgradient: nan, tolerance: 9.999999747378752e-05

probably due to fixed convergence tolerances. Setting gradient_tol = 1e-3 fixes this.

@jtilly
Copy link
Member

jtilly commented Sep 20, 2024

Yes, this is a bit of a rabbit hole.

We looked into this when we built glum originally and decided it wasn't worth the effort. However, we never had a real need to go to float32.

I think we'll also have to do a bit of work in tabmat:

 import tabmat as tm
 import numpy as np
 
 X = tm.DenseMatrix(
     np.random.rand(1000, 10).astype(np.float32),
 )
 beta = np.random.rand(10).astype(np.float32)
 
 print(X.sandwich(beta))
 
 [[nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]]

Works fine with float64. I guess we're running into overflow issues somewhere.

Edit: reproducer here: https://github.com/Quantco/tabmat/compare/test-float32?expand=1

@stanmart
Copy link
Collaborator

I'm having issues finding an X, beta combination for which we run into the problem. @jtilly, if you have one, could you please pickle and attach it to this thread?

@mlondschien
Copy link
Contributor Author

mlondschien commented Sep 21, 2024

Two questions about the convergence criteria:

  • I know that even if scale_predictors = True, internally, data and coefficients are kept without rescaling, but the lasso / ridge penalties are reweighted by the inverse of the features' standard deviations / variances. Is this logic also applied to gradient_tol and step_size_tol? Or are they compared to the "raw" coefficients / gradients / steps?
  • Since gradient_tol and step_size_tol are compared against the norms of gradients / step sizes, would it make sense to scale them with sqrt(n_features)?

Do you have a reference on how to improve convergence? For reasonable alpha and a large (but not super large) dataset ~10M x 4k, glum does not converge (in coefficient) with gradient_tol=1e-3.

Iteration 0: |          | 0/? [s/it, gradient norm=1.196613084175624e-07]
Alpha: 0.08150364458560944, Iterations: 0, Time: 0.8982693669968285
Iteration 2:  63%|██████▎   | 1.26/2.0 [7.77s/it, gradient norm=0.005485215689986944]
Alpha: 0.05019387602806091, Iterations: 3, Time: 24.94159863999812
Iteration 1:  34%|███▍      | 0.68/2.0 [8.89s/it, gradient norm=0.02099936455488205]]
Alpha: 0.03091180883347988, Iterations: 2, Time: 17.32267956000578
Iteration 1:  67%|██████▋   | 1.34/2.0 [9.77s/it, gradient norm=0.004562998190522194]
Alpha: 0.01903698220849037, Iterations: 2, Time: 25.582762536010705
Iteration 1:  70%|██████▉   | 1.39/2.0 [11.25s/it, gradient norm=0.004092678427696228]
Alpha: 0.011723890900611877, Iterations: 2, Time: 28.499169431001064
Iteration 1:  74%|███████▍  | 1.49/2.0 [10.71s/it, gradient norm=0.003259933553636074]
Alpha: 0.007220137398689985, Iterations: 2, Time: 27.16131660400424
Iteration 1:  82%|████████▏ | 1.64/2.0 [10.39s/it, gradient norm=0.0023090492468327284]
Alpha: 0.004446508828550577, Iterations: 2, Time: 29.888066792991594
Iteration 1:  87%|████████▋ | 1.74/2.0 [13.20s/it, gradient norm=0.001813305076211691]]
Alpha: 0.0027383745182305574, Iterations: 2, Time: 32.22748096199939
Iteration 1:  87%|████████▋ | 1.74/2.0 [13.20s/it, gradient norm=0.0018133050/cluster/customapps/biomed/grlab/users/lmalte/mambaforge/envs/icufm/lib/python3.10/site-packages/glum/_solvers.py:58: ConvergenceWarning: Coordinate descent did not converge. You might want to increase the number of iterations. Minimum norm subgradient: nan, tolerance: 0.0010000000474974513
  new_coef, gap, _, _, n_cycles = enet_coordinate_descent_gram(1036083102226]
/cluster/customapps/biomed/grlab/users/lmalte/mambaforge/envs/icufm/lib/python3.10/site-packages/glum/_solvers.py:819: ConvergenceWarning: Line search failed. Next iteration will be very close to current iteration. Might result in more convergence issues.
  warnings.warn(
Iteration 99:  20%|██        | 0.4/2.0 [20.86s/it, gradient norm=0.040047112852334976]
/cluster/customapps/biomed/grlab/users/lmalte/mambaforge/envs/icufm/lib/python3.10/site-packages/glum/_solvers.py:345: ConvergenceWarning: IRLS failed to converge. Increase the maximum number of iterations max_iter (currently 100)%|██        | 0.4/2.0 [11.51s/it, gradient norm=0.040109891444444656]
  warnings.warn(
Alpha: 0.0016864229692146182, Iterations: 100, Time: 2164.3713447759947
Iteration 99:   7%|▋         | 0.14/2.0 [22.36s/it, gradient norm=0.07259562611579895]
Alpha: 0.0010385805508121848, Iterations: 100, Time: 2299.8850959279807
Iteration 99:   0%|          | 0.01/2.0 [22.62s/it, gradient norm=0.09704066812992096]
Alpha: 0.0006396080134436488, Iterations: 100, Time: 2326.857085836993
Iteration 99:  31%|███▏      | 0.94/3.0 [22.24s/it, gradient norm=0.11388653516769409]
Alpha: 0.0003939014277420938, Iterations: 100, Time: 2324.6465414399863
Iteration 99:  30%|███       | 0.9/3.0 [22.22s/it, gradient norm=0.12463512271642685]
Alpha: 0.0002425834973109886, Iterations: 100, Time: 2289.6218947519956
Iteration 99:  29%|██▉       | 0.88/3.0 [23.25s/it, gradient norm=0.1313784420490265]
Alpha: 0.00014939460379537195, Iterations: 100, Time: 2325.5795365330123
Iteration 99:  29%|██▉       | 0.87/3.0 [23.35s/it, gradient norm=0.13564461469650269]
Alpha: 9.200440399581566e-05, Iterations: 100, Time: 2270.3260740459955
Iteration 99:  29%|██▊       | 0.86/3.0 [22.42s/it, gradient norm=0.13827645778656006]
Alpha: 5.6660748668946326e-05, Iterations: 100, Time: 2301.0685696779983
Iteration 99:  28%|██▊       | 0.85/3.0 [22.37s/it, gradient norm=0.13990382850170135]
Alpha: 3.489441951387562e-05, Iterations: 100, Time: 2305.937522252003
Iteration 99:  28%|██▊       | 0.85/3.0 [22.54s/it, gradient norm=0.14089839160442352]
Alpha: 2.1489666323759593e-05, Iterations: 100, Time: 2327.7936690200004
Iteration 1:  28%|██▊       | 0.85/3.0 [13.11s/it, gradient norm=0.1409554928541183

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants