Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate LOCOs of DateToUnitCircleTransformer. #349

Merged
merged 15 commits into from
Jul 11, 2019

Conversation

sanmitra
Copy link
Contributor

Related issues
Aggregated LOCOs of DateToUnitCircleTransformer outputs.

Describe the proposed solution
For each time period we aggregate all the LOCOs from the same date feature by the mean.

@salesforce-cla
Copy link

Thanks for the contribution! It looks like @ijeri is an internal user so signing the CLA is not required. However, we need to confirm this.

@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @sanmitra to sign the Salesforce.com Contributor License Agreement.

@codecov
Copy link

codecov bot commented Jun 28, 2019

Codecov Report

Merging #349 into master will increase coverage by 0.03%.
The diff coverage is 96.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage    86.8%   86.83%   +0.03%     
==========================================
  Files         336      336              
  Lines       10865    10873       +8     
  Branches      367      576     +209     
==========================================
+ Hits         9431     9442      +11     
+ Misses       1434     1431       -3
Impacted Files Coverage Δ
...e/op/stages/impl/insights/RecordInsightsLOCO.scala 96.66% <96.87%> (+1.54%) ⬆️
...es/src/main/scala/com/salesforce/op/OpParams.scala 89.79% <0%> (+4.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28eac0c...cd1df55. Read the comment docs.


private def isMapFeature(featureType: String ): Boolean = {
val featureTypeTag = FeatureType.featureTypeTag(featureType)
featureTypeTag.tpe <:< weakTypeOf[OPMap[_]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is very very very slow to perform at runtime. let's just check if history.grouping is present or not. ie.

private def getRawFeatureName(history: OpVectorColumnHistory): Option[String] = history.grouping match {
    case Some(grouping) => history.parentFeatureOrigins.headOption.map(_ + "_" + grouping)
    case None => history.parentFeatureOrigins.headOption
}

for {name <- getRawFeatureName(history)} {
// Update the aggregation map for each (rawFeatureName, timePeriod) in case of date features.
val key = if (isUnitCircleDateFeature) {
name + "_" + history.descriptorValue.flatMap(convertToTimePeriod).map(_.entryName).getOrElse("")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's replace isUnitCircleDateFeature val and if / else with pattern matching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pattern match on what exactly ? history ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, isUnitCircleDateFeature is referred multiple times, hence created a val.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can probably pattern match on history and it should make the code more readable as you would have clearly separated cases. give it a try.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remember that you can pattern match together with if conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -466,4 +427,167 @@ class RecordInsightsLOCOTest extends FlatSpec with TestSparkContext {
assertAggregatedTextMap(textAreaMap, "k1")

}
it should "aggregate values for date, datetime, dateMap and dateTimeMap derived features" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I am not able to follow these tests anymore.

Can we please make them more readable by either splitting them into smaller test cases or rather create a behavior (http://www.scalatest.org/user_guide/sharing_tests), or at least add clues withClue - http://www.scalatest.org/user_guide/using_assertions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will refractor.

@sanmitra
Copy link
Contributor Author

sanmitra commented Jul 5, 2019

@tovbinm please take a look at the refactoring of test. Thanks.

Copy link
Collaborator

@leahmcguire leahmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leahmcguire leahmcguire merged commit 87aca8d into master Jul 11, 2019
@leahmcguire leahmcguire deleted the san/dateLOCOAggregation branch July 11, 2019 19:06
@salesforce-cla
Copy link

Thanks for the contribution! Unfortunately we can't verify the commit author(s): leahmcguire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants