-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sorting in Prediction type for multiclass classification and add stronger tests #213
Conversation
…d some extra tests
…multiclass classification
Codecov Report
@@ Coverage Diff @@
## master #213 +/- ##
==========================================
+ Coverage 86.36% 86.37% +0.01%
==========================================
Files 310 310
Lines 10136 10137 +1
Branches 351 548 +197
==========================================
+ Hits 8754 8756 +2
+ Misses 1382 1381 -1
Continue to review full report at Codecov.
|
|
||
// Need to make sure we sort the keys by their final index, which comes after an underscore in the apply function | ||
private def keysStartsWith(name: String): Array[String] = value.keys.filter(_.startsWith(name)).toArray | ||
.sortBy(s => s.substring(s.lastIndexOf("_") + 1).toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.split('_').last.toInt
val numClasses = 5 | ||
val numRows = 10 | ||
val vectors = Seq.fill[OPVector](numRows)(Array.fill(numClasses)(4.2).toOPVector) | ||
println(s"vectors: $vectors") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove prints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or replace with log
Sorting condition did not make it? |
|
||
// Need to make sure we sort the keys by their final index, which comes after an underscore in the apply function | ||
private def keysStartsWith(name: String): Array[String] = value.keys.filter(_.startsWith(name)).toArray | ||
.sortBy(_.split('_').last.toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depending how wide the keys are, but it would be more efficient to scan keys backwards to look for _
with lastIndexOf
to map to a substring
. It would also produce less transient garbage: a single string instead of an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too late for the party :)
Related issues
None
Describe the proposed solution
Sorts keys by their final index - the actual index corresponding to the class, rather than the default string sort which jumbles classes together. Also adds many tests relevant to multiclass classification and threshold metrics.
Describe alternatives you've considered
N/A
Additional context
Previous threshold metrics (only those, not the metrics from Spark itself) were out of order when there were more that 10 classes due to this issue.