-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
made case class to deal with model selector metadata #39
Conversation
Codecov Report
@@ Coverage Diff @@
## master #39 +/- ##
==========================================
+ Coverage 83.49% 84.18% +0.68%
==========================================
Files 296 298 +2
Lines 9380 9749 +369
Branches 344 559 +215
==========================================
+ Hits 7832 8207 +375
+ Misses 1548 1542 -6
Continue to review full report at Codecov.
|
BinaryClassEvalMetrics.withNameInsensitiveOption(name) | ||
.orElse(MultiClassEvalMetrics.withNameInsensitiveOption(name)) | ||
.orElse(RegressionEvalMetrics.withNameInsensitiveOption(name)) | ||
.orElse(OpEvaluatorNames.withNameInsensitiveOption(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it weird to have evaluator names to be an evaluator metrics. perhaps lets just have evaluation metric and drop the OpEvaluatorNames
completely. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a change I introduced it was always the case because there are metrics that are grouped by the evaluator. if we want to restructure it it should be in a separate PR
} | ||
|
||
object EvalMetric { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs please
* | ||
* @param json encoded metrics | ||
*/ | ||
def fromJson(className: String, json: String): EvaluationMetrics = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type should be Try[EvaluationMetrics]
def error(c: Class[_]) = throw new IllegalArgumentException( | ||
s"Could not extract metrics of type $c from ${json.mkString(",")}" | ||
) | ||
className match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to compare the full class instead of simple class name, i.e.
val metricsClass = ReflectionUtils.classForName(className).asInstanceOf[Class[_ <: EvaluationMetrics]]
metricsClass match {
case n if n == classOf[MultiMetrics] =>
case n if n == classOf[BinaryClassificationMetrics] =>
// etc
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is totally unnecessary - I am going to delete this companion object and make this a private method inside the metadata so it is more clear how it is used and prevent people from trying to reuse it
@@ -521,15 +362,33 @@ case class Insights | |||
case object ModelInsights { | |||
@transient protected lazy val log = LoggerFactory.getLogger(this.getClass) | |||
|
|||
val SerFormats: Formats = Serialization.formats(FullTypeHints(List( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit more readable version:
val SerializationFormats: Formats = {
val typeHints = FullTypeHints(List(
classOf[Continuous], classOf[Discrete],
classOf[DataBalancerSummary], classOf[DataCutterSummary], classOf[DataSplitterSummary],
classOf[SingleMetric], classOf[MultiMetrics], classOf[BinaryClassificationMetrics], classOf[ThresholdMetrics],
classOf[MultiClassificationMetrics], classOf[RegressionMetrics]
))
val evalMetricsSerializer = new CustomSerializer[EvalMetric](_ =>
( { case JString(s) => EvalMetric.withNameInsensitive(s) },
{ case x: EvalMetric => JString(x.entryName) }
)
)
Serialization.formats(typeHints) +
EnumEntrySerializer.json4s[ValidationType](ValidationType) +
EnumEntrySerializer.json4s[ProblemType](ProblemType) +
new SpecialDoubleSerializer +
evalMetricsSerializer
}
|
||
private[op] object SplitterSummary { | ||
val ClassName: String = "className" | ||
def fromMap(map: Map[String, Any]): SplitterSummary = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below is slightly modified and a better version (pros: returns Try, uses class comparison and handles the default case):
def fromMap(map: Map[String, Any]): Try[SplitterSummary] = Try {
val summaryClass = ReflectionUtils.classForName(map(ClassName).toString).asInstanceOf[Class[_ <: SplitterSummary]]
summaryClass match {
case s if s == classOf[DataSplitterSummary] => DataSplitterSummary()
case s if s == classOf[DataBalancerSummary] => DataBalancerSummary(
positiveLabels = map(ModelSelectorBaseNames.Positive).asInstanceOf[Long],
negativeLabels = map(ModelSelectorBaseNames.Negative).asInstanceOf[Long],
desiredFraction = map(ModelSelectorBaseNames.Desired).asInstanceOf[Double],
upSamplingFraction = map(ModelSelectorBaseNames.UpSample).asInstanceOf[Double],
downSamplingFraction = map(ModelSelectorBaseNames.DownSample).asInstanceOf[Double]
)
case s if s == classOf[DataCutterSummary] => DataCutterSummary(
labelsKept = map(ModelSelectorBaseNames.LabelsKept).asInstanceOf[Array[Double]],
labelsDropped = map(ModelSelectorBaseNames.LabelsDropped).asInstanceOf[Array[Double]]
)
case s =>
throw new IllegalArgumentException(s"Unrecognised splitter summary class: $s")
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again this does not have general use it is only for metadata deser - that is why it is private i will move it and make it a private method
} | ||
} | ||
|
||
implicit val doubleOptEquality = new Equality[Option[Double]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! how useful are these? do you think we should have them in a trait somewhere in com.salesforce.op.test
in our test kit?
decoded.bestModelName shouldEqual summary.bestModelName | ||
decoded.bestModelType shouldEqual summary.bestModelType | ||
decoded.validationResults shouldEqual summary.validationResults | ||
decoded.trainEvaluation.toJson() shouldEqual summary.trainEvaluation.toJson() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this toJson() to avoid NaN issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trainEvaluation doesn't contain NaN
, no?
Does scala know how to compare two complex case class objects?
@@ -107,7 +107,8 @@ class OpValidatorTest extends FlatSpec with TestSparkContext { | |||
assertFractions(Array(1 - p, p), train) | |||
assertFractions(Array(1 - p, p), validate) | |||
} | |||
balancer.get.metadataBuilder.build() should not be new MetadataBuilder().build() | |||
println(balancer.get.summary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove println
@@ -45,7 +45,8 @@ import org.apache.spark.sql.types.MetadataBuilder | |||
import com.salesforce.op.utils.spark.RichDataset._ | |||
|
|||
@RunWith(classOf[JUnitRunner]) | |||
class OpValidatorTest extends FlatSpec with TestSparkContext { | |||
class | |||
OpValidatorTest extends FlatSpec with TestSparkContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove redundant end line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!!
def areEqual(a: Option[Double], b: Any): Boolean = b match { | ||
case None => a.isEmpty | ||
case s: Option[Double] => (a.exists(_.isNaN) && s.exists(_.isNaN)) || | ||
(a.nonEmpty && a.toSeq.zip(s.toSeq).forall{ case (n, m) => n == m }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe (a.exists(_.isNaN) && s.exists(_.isNaN)) || (a == b)
?
Related issues
metadata is terrible
Describe the proposed solution
we should have concrete classes to deal with it instead of nasty nested maps
Describe alternatives you've considered
leave the nasty nested maps and deal with them