Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logistic Regression] Support fit on two classes #343

Merged
merged 7 commits into from
Aug 9, 2023

Conversation

lijinf2
Copy link
Collaborator

@lijinf2 lijinf2 commented Jul 25, 2023

Require installing cuml 23.08 nightly to run the tests.

@lijinf2
Copy link
Collaborator Author

lijinf2 commented Jul 25, 2023

build

) -> Callable[[FitInputType, Dict[str, Any]], Dict[str, Any],]:
num_workers = self.num_workers
array_order = self._fit_array_order()
num_classes = dataset.select(alias.label).distinct().count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems num_classes's not been used.


# filter only supported params
init_parameters = {
k: v for k, v in init_parameters.items() if k in supported_params
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems supported_params is always an empty list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, working on C++/Cython support: rapidsai/cuml#5516
Will address this in the next PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you be integrating the init params in this PR, now that the cuml init params PR is merged? It would help flesh out the param mapping and the compat tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. not sure actually.

The init PR will update a certain amount of the codes in multiple places of this PR. May make this PR too long and introduce extra reviewing overhead.

Thinking of getting this one merged. Then I will create a new PR for transform and init and update all the tests in test_logistic_regression.py.

@@ -0,0 +1,111 @@
from typing import Any, Dict, Tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a 'compat' test similar to other classes with branches for non-supported apis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'compat' test requires vector input type. Currently spark rapids ml converts vector input type into float64 that has no support in cuml C++/Cython 23.08 yet.

Do we want to work out a workaround?

@lijinf2
Copy link
Collaborator Author

lijinf2 commented Aug 5, 2023

build

Copy link
Collaborator

@leewyang leewyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo (I think), otherwise LGTM

feature_type,
) = super()._pre_process_data(dataset)

# if input format is vectorUDT, convert data type from float32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to float32"?

@lijinf2
Copy link
Collaborator Author

lijinf2 commented Aug 8, 2023

build

@lijinf2 lijinf2 merged commit e6f95f7 into NVIDIA:branch-23.08 Aug 9, 2023
1 check passed
@lijinf2 lijinf2 deleted the lr_fit branch March 6, 2024 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants