[LogisticRegressionMG] Support standardization with no data modification #5724

lijinf2 · 2024-01-18T22:53:26Z

The key idea is to modify coefficients in linearFwd to get the same predictions, and modify the gradients in linearBwd to get the same gradients.

copy-pr-bot · 2024-01-18T22:53:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cjnolet

Hey @lijinf2. Your changes look great for the most part. I'd still ike to see the mg standardization pieces pulled out eventually. Most of my feedback is in the quality of the pytests but I think the fixes should be straightforward.

cjnolet · 2024-01-31T19:33:26Z

cpp/src/glm/qn/mg/glm_base_mg.cuh

+                                   true,
+                                   raft::mul_op(),
+                                   stream);
+      raft::resource::sync_stream(*(this->handle_p));


We shouldn't need to sync here since we're not copying anything to host to be read directly after.

Good point. Revised.

cjnolet · 2024-01-31T19:55:43Z

cpp/src/glm/qn/mg/standardization.cuh

+  SimpleDenseMat<T> mean_mat(mean_vector, 1, D);
+
+  // calculate mean
+  rmm::device_uvector<T> ones(num_rows, stream);


It would be really nice if we could encapculate this normalization computation so that it can reused in RAFT. I understand now the complexities involved in refactoring. For example, I often forget about the SimpleMat because it's buried so deep in cuML's other APIs (and only used in the qn solvers).

Still, we are going to start moving some of the mnmg primitives over to RAFT. Hopefully in the not-so-distant future. This also includes k-means and whatnot.

Sounds good. Seems need a PR to RAFT then a PR to cuml to revise this part.
Thinking to get it done in the next release. Created an issue ticket for tracking this: #5739.

cjnolet · 2024-01-31T20:42:08Z

python/cuml/tests/dask/test_dask_logistic_regression.py

+        "max_iter": max_iter,
+    }
+
+    X_origin = np.array(


This is going to lead to tests which are brittle and hard to maintain. Please generate this data and process the naive version of the expected result (you can do the standardization up front and use singe-gpu logistic). Please do this instead of hardcoding values. We seldomly have to resort to doing this but only because a reasonable way to achieve a naive test doesn't exist.

Tried generating a random sparse matrix of any size for classification. Please check!

cjnolet · 2024-01-31T20:42:19Z

python/cuml/tests/dask/test_dask_logistic_regression.py

+        "max_iter": max_iter,
+    }
+
+    X = np.array(


Please see below for comments on hardcoding these values and generating larger test cases.

cjnolet · 2024-01-31T20:43:38Z

python/cuml/tests/dask/test_dask_logistic_regression.py

+)
+@pytest.mark.parametrize("datatype", [np.float32])
+@pytest.mark.parametrize("delayed", [False])
+@pytest.mark.parametrize("ncol_and_nclasses", [(2, 2), (6, 4)])


Can we test for a few different variations here please? Even just a couple higher numbers like (100, 10) would help.

Sure! Just added (100, 10)

cjnolet · 2024-01-31T20:45:43Z

python/cuml/tests/dask/test_dask_logistic_regression.py

+        datatype,
+    )
+
+    X_origin = np.ascontiguousarray(X_origin.T)


Please generate larger arrays for testing, especially when sparse. It doesn't have to be massive, but larger than 4x5 would be helpful (like 100x25 or 1000x100).

Yeah, added a function to generate sparse matrix of any size for multi-class classification.

cjnolet

LGTM! Thanks for creating the issue to pull out the normalization pieces.

…uous coef, support fit_intercept=False adaptation

…ed and all tests passed with and without standardization

raydouglass · 2024-02-05T15:33:06Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Jan 18, 2024

lijinf2 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 18, 2024

lijinf2 force-pushed the fea_lrmg_std_no_data_modify branch from 4745c35 to 4519a1f Compare January 18, 2024 22:56

lijinf2 marked this pull request as ready for review January 19, 2024 01:46

lijinf2 requested review from a team as code owners January 19, 2024 01:46

cjnolet requested changes Jan 31, 2024

View reviewed changes

lijinf2 force-pushed the fea_lrmg_std_no_data_modify branch 2 times, most recently from 72c3979 to f592e93 Compare February 1, 2024 04:54

lijinf2 mentioned this pull request Feb 1, 2024

[FEA] move mean and standard deviation calculation of sparse vectors from cuml to raft #5739

Open

lijinf2 force-pushed the fea_lrmg_std_no_data_modify branch from d37f95c to 87f8680 Compare February 1, 2024 09:31

cjnolet approved these changes Feb 1, 2024

View reviewed changes

caryr35 assigned cjnolet Feb 2, 2024

lijinf2 requested a review from a team as a code owner February 2, 2024 17:47

github-actions bot added the ci label Feb 2, 2024

lijinf2 force-pushed the fea_lrmg_std_no_data_modify branch from 09291a1 to e911001 Compare February 2, 2024 20:29

github-actions bot removed the ci label Feb 2, 2024

lijinf2 added 10 commits February 2, 2024 12:30

in progress to add division to qn_mg.cu

69bba01

finished an implementation, debugging, and added tests

98678e8

implemented mean stddev calculation that supports row_major matrix

cc63073

support mean stddev for rowMajor matrix, revise code for the F-contin…

1dc6c8b

…uous coef, support fit_intercept=False adaptation

get MG standardization working and tested with Dask and Spark

7f76b62

Get framework working, testing with data modification. Get errors fix…

4c6e49a

…ed and all tests passed with and without standardization

checkpoint implementation

0a6f584

get a test passed with no regularization and fit_intercept=True

3b98bbc

refactor to avoid copies and memory allocation in each iteration

0c62a35

support regularization

5e94114

lijinf2 added 3 commits February 2, 2024 12:30

support sparse vectors

2ebab11

revise per comments to improve the tests

51114e9

remove sparse vectors in this PR

72d5796

lijinf2 force-pushed the fea_lrmg_std_no_data_modify branch from e911001 to 72d5796 Compare February 2, 2024 20:30

cjnolet approved these changes Feb 2, 2024

View reviewed changes

raydouglass removed the request for review from a team February 5, 2024 15:32

raydouglass merged commit dc02a3f into rapidsai:branch-24.02 Feb 5, 2024
53 of 54 checks passed

lijinf2 mentioned this pull request Feb 27, 2024

[DO NOT MERGE] Testing sparse vectors standardization for logistic regression MG #5768

Closed

lijinf2 deleted the fea_lrmg_std_no_data_modify branch March 5, 2024 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LogisticRegressionMG] Support standardization with no data modification #5724

[LogisticRegressionMG] Support standardization with no data modification #5724

lijinf2 commented Jan 18, 2024

copy-pr-bot bot commented Jan 18, 2024

cjnolet left a comment

cjnolet Jan 31, 2024

lijinf2 Feb 1, 2024

cjnolet Jan 31, 2024

lijinf2 Feb 1, 2024

cjnolet Jan 31, 2024

lijinf2 Feb 1, 2024

cjnolet Jan 31, 2024

lijinf2 Feb 1, 2024

cjnolet Jan 31, 2024

lijinf2 Feb 1, 2024

cjnolet Jan 31, 2024

lijinf2 Feb 1, 2024

cjnolet left a comment

raydouglass commented Feb 5, 2024

[LogisticRegressionMG] Support standardization with no data modification #5724

[LogisticRegressionMG] Support standardization with no data modification #5724

Conversation

lijinf2 commented Jan 18, 2024

copy-pr-bot bot commented Jan 18, 2024

cjnolet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

raydouglass commented Feb 5, 2024