-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
umap fit() implementation and tests #321
Conversation
rishic3
commented
Jul 6, 2023
•
edited
Loading
edited
- future additions:
- OOM detection of data subsample (requires running a mini spark job to query GPU memory on node)
- "convert_dtype" fit() param support
Signed-off-by: Rishi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if another test can be added to test the UMAP estimator persistence.
Idea to avoid overriding _call_cuml_fit_func():Currently, override of _call_cuml_fit_func is only necessitated by the yield statement. Alternatively, add class method Open to other (cleaner?) suggestions. |
I would suggest the 2nd way which can re-use code. |
Is it possible to set pyspark configuration to increase this serialization limit? One way is to check the expected serialization limit and hint user to increase the pyspark configuration parameter. another option is to return row by row in core.py for all algorithms. (any downside?) |
There is the driver.maxResultSize that can be set to unlimited, but I don't think there's a way to increase the JVM 2GB byte array limit. As for returning row-by-row for all algos, I don't think this would work - UMAP is not implementing the typical get_cuml_fit_func and instead has its own generator function, so we would have to also make this change for other algos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some additional comments. looks good overall.
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM