-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Feature: pairwise edit distance for each string on a given nvstrings object #2803
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.10 #2803 +/- ##
============================================
Coverage 86.51% 86.51%
============================================
Files 48 48
Lines 9013 9013
============================================
Hits 7798 7798
Misses 1215 1215 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job Ayush. I like how you minimized the compute buffer too.
Thanks David! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Algorithm looks fine, just adjustments to match rest of cuDF style.
The strings API is being changed as part of the rewrite for cudf::column support. First PR is #2811. Introducing broad cudf style changes into deprecating code may be a waste of time in my opinion. Regardless, I agree with most of these suggestions to make the API clearer until deprecation occurs. |
If this code is being deprecated, why have the PR at all? |
Because it will not be replaced probably until next release and we need to keep adding features and fixing defects to the existing working functions until the new code is in place. |
I don't think anything I suggested is onerous. If Ayush wants to skip the naming changes that's fine, but all the API, documentation, CUDF_EXPECTS etc. changes are relevant and not tedious. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, everything else looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python looks good
This PR implements the
edit_distance_matrix()
function that computes pairwise distance on a list of strings(i.e. nvstrings object) and returns a distance matrix.Recently, I was working on a strings related use-case and where I felt the need of this function...
Example:
cc @davidwendt