Use dtype dependent precision #844

mlondschien · 2024-09-20T07:16:02Z

jtilly · 2024-09-20T07:47:54Z

It would be very cool to have float32 support that "just works". I would expect that you will run into a couple more issues.

In 653d6f1 I'm now running the test suite on a float32 dataset. This actually looks pretty good, it's just that on the inference side, we're still expecting doubles in a lot of places.

================================================================================== short test summary info ===================================================================================
FAILED tests/glm/test_glm.py::test_solver_equivalence[float32-False-solver=lbfgs, alpha=1.0] - AssertionError: 
FAILED tests/glm/test_glm.py::test_solver_equivalence[float32-True-solver=lbfgs, alpha=1.0] - AssertionError: 
FAILED tests/glm/test_glm.py::test_alpha_search[float32] - AssertionError: 
FAILED tests/glm/test_glm.py::test_ols_std_errors[float32] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_array_std_errors[float32-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_sparse_std_errors[float32] - TypeError: self and d need to be of same dtype, either np.float64
FAILED tests/glm/test_glm.py::test_inputtype_std_errors[float32-False-False-False] - AssertionError: 
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-single-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_vars-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-multiple_constraints-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-poisson-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-poisson-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-normal-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-normal-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-binomial-True] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix[float32-rhs_not_zero-binomial-False] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-single] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-multiple_vars] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-multiple_constraints] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_public[float32-rhs_not_zero] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-single] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-multiple_vars] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-multiple_constraints] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_wald_test_matrix_fixed_cov[float32-rhs_not_zero] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_information_criteria_raises_correct_warnings_and_errors[float32] - RuntimeWarning: overflow encountered in cast
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-robust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix[float32-nonrobust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_errors[float32] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-robust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_alpha_search[float32-nonrobust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-robust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-opg-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-opg-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-oim-clustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
FAILED tests/glm/test_glm.py::test_store_covariance_matrix_cv[float32-nonrobust-oim-nonclustered] - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
======================================================================= 70 failed, 1220 passed, 342 warnings in 17.42s =======================================================================

stanmart · 2024-09-20T08:36:56Z

This is an example fix for one of the mistakes causing the errors on @jtilly's branch.

--- a/src/glum/_glm.py
+++ b/src/glum/_glm.py
@@ -2128,7 +2128,7 @@ class GeneralizedLinearRegressorBase(BaseEstimator, RegressorMixin):
             )

         if (
-            np.linalg.cond(_safe_toarray(X.sandwich(np.ones(X.shape[0]))))
+            np.linalg.cond(_safe_toarray(X.sandwich(np.ones(X.shape[0], dtype=X.dtype))))
             > 1 / sys.float_info.epsilon**2
         ):
             raise np.linalg.LinAlgError(

There are a bunch of similar ones in the functions used for calculating the covariance matrix.

mlondschien · 2024-09-20T11:27:21Z

This actually looks pretty good, it's just that on the inference side, we're still expecting doubles in a lot of places.

I think there are also quite some "Kinderkrankheiten" that are not covered by the tests. E.g., if run on "real data", enet_coordinate_descent_gram sometimes returns coef with +- inf's, resulting in an error further down. So far, I wasn't able to reproduce this with simulated data. Also, we see a lot of

 ConvergenceWarning: Line search failed. Next iteration will be very close to current iteration. Might result in more convergence issues.

and

 ConvergenceWarning: Coordinate descent did not converge. You might want to increase the number of iterations. Minimum norm subgradient: nan, tolerance: 9.999999747378752e-05

probably due to fixed convergence tolerances. Setting gradient_tol = 1e-3 fixes this.

jtilly · 2024-09-20T11:32:05Z

Yes, this is a bit of a rabbit hole.

We looked into this when we built glum originally and decided it wasn't worth the effort. However, we never had a real need to go to float32.

I think we'll also have to do a bit of work in tabmat:

 import tabmat as tm
 import numpy as np
 
 X = tm.DenseMatrix(
     np.random.rand(1000, 10).astype(np.float32),
 )
 beta = np.random.rand(10).astype(np.float32)
 
 print(X.sandwich(beta))
 
 [[nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]
  [nan nan nan nan nan nan nan nan nan nan]]

Works fine with float64. I guess we're running into overflow issues somewhere.

Edit: reproducer here: https://github.com/Quantco/tabmat/compare/test-float32?expand=1

stanmart · 2024-09-20T11:44:23Z

I'm having issues finding an X, beta combination for which we run into the problem. @jtilly, if you have one, could you please pickle and attach it to this thread?

mlondschien · 2024-09-21T12:47:28Z

Two questions about the convergence criteria:

I know that even if scale_predictors = True, internally, data and coefficients are kept without rescaling, but the lasso / ridge penalties are reweighted by the inverse of the features' standard deviations / variances. Is this logic also applied to gradient_tol and step_size_tol? Or are they compared to the "raw" coefficients / gradients / steps?
Since gradient_tol and step_size_tol are compared against the norms of gradients / step sizes, would it make sense to scale them with sqrt(n_features)?

Do you have a reference on how to improve convergence? For reasonable alpha and a large (but not super large) dataset ~10M x 4k, glum does not converge (in coefficient) with gradient_tol=1e-3.

Iteration 0: |          | 0/? [s/it, gradient norm=1.196613084175624e-07]
Alpha: 0.08150364458560944, Iterations: 0, Time: 0.8982693669968285
Iteration 2:  63%|██████▎   | 1.26/2.0 [7.77s/it, gradient norm=0.005485215689986944]
Alpha: 0.05019387602806091, Iterations: 3, Time: 24.94159863999812
Iteration 1:  34%|███▍      | 0.68/2.0 [8.89s/it, gradient norm=0.02099936455488205]]
Alpha: 0.03091180883347988, Iterations: 2, Time: 17.32267956000578
Iteration 1:  67%|██████▋   | 1.34/2.0 [9.77s/it, gradient norm=0.004562998190522194]
Alpha: 0.01903698220849037, Iterations: 2, Time: 25.582762536010705
Iteration 1:  70%|██████▉   | 1.39/2.0 [11.25s/it, gradient norm=0.004092678427696228]
Alpha: 0.011723890900611877, Iterations: 2, Time: 28.499169431001064
Iteration 1:  74%|███████▍  | 1.49/2.0 [10.71s/it, gradient norm=0.003259933553636074]
Alpha: 0.007220137398689985, Iterations: 2, Time: 27.16131660400424
Iteration 1:  82%|████████▏ | 1.64/2.0 [10.39s/it, gradient norm=0.0023090492468327284]
Alpha: 0.004446508828550577, Iterations: 2, Time: 29.888066792991594
Iteration 1:  87%|████████▋ | 1.74/2.0 [13.20s/it, gradient norm=0.001813305076211691]]
Alpha: 0.0027383745182305574, Iterations: 2, Time: 32.22748096199939
Iteration 1:  87%|████████▋ | 1.74/2.0 [13.20s/it, gradient norm=0.0018133050/cluster/customapps/biomed/grlab/users/lmalte/mambaforge/envs/icufm/lib/python3.10/site-packages/glum/_solvers.py:58: ConvergenceWarning: Coordinate descent did not converge. You might want to increase the number of iterations. Minimum norm subgradient: nan, tolerance: 0.0010000000474974513
  new_coef, gap, _, _, n_cycles = enet_coordinate_descent_gram(1036083102226]
/cluster/customapps/biomed/grlab/users/lmalte/mambaforge/envs/icufm/lib/python3.10/site-packages/glum/_solvers.py:819: ConvergenceWarning: Line search failed. Next iteration will be very close to current iteration. Might result in more convergence issues.
  warnings.warn(
Iteration 99:  20%|██        | 0.4/2.0 [20.86s/it, gradient norm=0.040047112852334976]
/cluster/customapps/biomed/grlab/users/lmalte/mambaforge/envs/icufm/lib/python3.10/site-packages/glum/_solvers.py:345: ConvergenceWarning: IRLS failed to converge. Increase the maximum number of iterations max_iter (currently 100)%|██        | 0.4/2.0 [11.51s/it, gradient norm=0.040109891444444656]
  warnings.warn(
Alpha: 0.0016864229692146182, Iterations: 100, Time: 2164.3713447759947
Iteration 99:   7%|▋         | 0.14/2.0 [22.36s/it, gradient norm=0.07259562611579895]
Alpha: 0.0010385805508121848, Iterations: 100, Time: 2299.8850959279807
Iteration 99:   0%|          | 0.01/2.0 [22.62s/it, gradient norm=0.09704066812992096]
Alpha: 0.0006396080134436488, Iterations: 100, Time: 2326.857085836993
Iteration 99:  31%|███▏      | 0.94/3.0 [22.24s/it, gradient norm=0.11388653516769409]
Alpha: 0.0003939014277420938, Iterations: 100, Time: 2324.6465414399863
Iteration 99:  30%|███       | 0.9/3.0 [22.22s/it, gradient norm=0.12463512271642685]
Alpha: 0.0002425834973109886, Iterations: 100, Time: 2289.6218947519956
Iteration 99:  29%|██▉       | 0.88/3.0 [23.25s/it, gradient norm=0.1313784420490265]
Alpha: 0.00014939460379537195, Iterations: 100, Time: 2325.5795365330123
Iteration 99:  29%|██▉       | 0.87/3.0 [23.35s/it, gradient norm=0.13564461469650269]
Alpha: 9.200440399581566e-05, Iterations: 100, Time: 2270.3260740459955
Iteration 99:  29%|██▊       | 0.86/3.0 [22.42s/it, gradient norm=0.13827645778656006]
Alpha: 5.6660748668946326e-05, Iterations: 100, Time: 2301.0685696779983
Iteration 99:  28%|██▊       | 0.85/3.0 [22.37s/it, gradient norm=0.13990382850170135]
Alpha: 3.489441951387562e-05, Iterations: 100, Time: 2305.937522252003
Iteration 99:  28%|██▊       | 0.85/3.0 [22.54s/it, gradient norm=0.14089839160442352]
Alpha: 2.1489666323759593e-05, Iterations: 100, Time: 2327.7936690200004
Iteration 1:  28%|██▊       | 0.85/3.0 [13.11s/it, gradient norm=0.1409554928541183

Use dtype dependent precision.

18ba452

mlondschien and others added 3 commits September 25, 2024 11:27

Compare to tol.

a94dbb1

Merge branch 'main' into use-correct-dtype-precision

4d29ffa

Wrong way round.

0fcdb66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dtype dependent precision #844

Use dtype dependent precision #844

mlondschien commented Sep 20, 2024

jtilly commented Sep 20, 2024

stanmart commented Sep 20, 2024

mlondschien commented Sep 20, 2024 •

edited

Loading

jtilly commented Sep 20, 2024 •

edited

Loading

stanmart commented Sep 20, 2024

mlondschien commented Sep 21, 2024 •

edited

Loading

Use dtype dependent precision #844

Are you sure you want to change the base?

Use dtype dependent precision #844

Conversation

mlondschien commented Sep 20, 2024

jtilly commented Sep 20, 2024

stanmart commented Sep 20, 2024

mlondschien commented Sep 20, 2024 • edited Loading

jtilly commented Sep 20, 2024 • edited Loading

stanmart commented Sep 20, 2024

mlondschien commented Sep 21, 2024 • edited Loading

mlondschien commented Sep 20, 2024 •

edited

Loading

jtilly commented Sep 20, 2024 •

edited

Loading

mlondschien commented Sep 21, 2024 •

edited

Loading