-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Logistic Regression] Support fit on two classes #343
Conversation
build |
) -> Callable[[FitInputType, Dict[str, Any]], Dict[str, Any],]: | ||
num_workers = self.num_workers | ||
array_order = self._fit_array_order() | ||
num_classes = dataset.select(alias.label).distinct().count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems num_classes's not been used.
|
||
# filter only supported params | ||
init_parameters = { | ||
k: v for k, v in init_parameters.items() if k in supported_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems supported_params is always an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, working on C++/Cython support: rapidsai/cuml#5516
Will address this in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you be integrating the init params in this PR, now that the cuml init params PR is merged? It would help flesh out the param mapping and the compat tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. not sure actually.
The init PR will update a certain amount of the codes in multiple places of this PR. May make this PR too long and introduce extra reviewing overhead.
Thinking of getting this one merged. Then I will create a new PR for transform and init and update all the tests in test_logistic_regression.py.
@@ -0,0 +1,111 @@ | |||
from typing import Any, Dict, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a 'compat' test similar to other classes with branches for non-supported apis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'compat' test requires vector input type. Currently spark rapids ml converts vector input type into float64 that has no support in cuml C++/Cython 23.08 yet.
Do we want to work out a workaround?
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo (I think), otherwise LGTM
feature_type, | ||
) = super()._pre_process_data(dataset) | ||
|
||
# if input format is vectorUDT, convert data type from float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to float32"?
Signed-off-by: Jinfeng <[email protected]>
Signed-off-by: Jinfeng <[email protected]>
Signed-off-by: Jinfeng <[email protected]>
Signed-off-by: Jinfeng <[email protected]>
Signed-off-by: Jinfeng <[email protected]>
Signed-off-by: Jinfeng <[email protected]>
build |
Require installing cuml 23.08 nightly to run the tests.