Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loco test bug fix #423

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

AdamChit
Copy link
Collaborator

Related issues
test refactor made in #412 causes one of the test cases to fail.

Specially this test assumed that at-least 1 of the 4 features would be selected https://github.com/salesforce/TransmogrifAI/blob/master/core/src/test/scala/com/salesforce/op/stages/impl/insights/RecordInsightsLOCOTest.scala#L236

Describe the proposed solution
Test should allow the case where none of the features are picked

Additional context
example of the test failing:
Screen Shot 2019-10-15 at 9 24 36 PM

@codecov
Copy link

codecov bot commented Oct 16, 2019

Codecov Report

Merging #423 (b4a2116) into master (ac83ad7) will decrease coverage by 60.42%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #423       +/-   ##
===========================================
- Coverage   86.94%   26.52%   -60.43%     
===========================================
  Files         340      337        -3     
  Lines       11388    11131      -257     
  Branches      363      593      +230     
===========================================
- Hits         9901     2952     -6949     
- Misses       1487     8179     +6692     
Impacted Files Coverage Δ
...main/scala/com/salesforce/op/dsl/RichFeature.scala 0.00% <0.00%> (-100.00%) ⬇️
...main/scala/com/salesforce/op/filters/Summary.scala 0.00% <0.00%> (-100.00%) ⬇️
.../scala/com/salesforce/op/cli/gen/ProblemKind.scala 0.00% <0.00%> (-100.00%) ⬇️
...n/scala/com/salesforce/op/dsl/RichSetFeature.scala 0.00% <0.00%> (-100.00%) ⬇️
.../scala/com/salesforce/op/dsl/RichListFeature.scala 0.00% <0.00%> (-100.00%) ⬇️
.../scala/com/salesforce/op/stages/impl/package.scala 0.00% <0.00%> (-100.00%) ⬇️
...cala/com/salesforce/op/cli/gen/FileInProject.scala 0.00% <0.00%> (-100.00%) ⬇️
...scala/org/apache/spark/util/SparkThreadUtils.scala 0.00% <0.00%> (-100.00%) ⬇️
...cala/com/salesforce/op/OpWorkflowModelWriter.scala 0.00% <0.00%> (-100.00%) ⬇️
...cala/com/salesforce/op/evaluators/Evaluators.scala 0.00% <0.00%> (-100.00%) ⬇️
... and 201 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac83ad7...b6829fe. Read the comment docs.

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it not pick any features?

it("should pick between 1 and 4 of the features") {
all(parsed.map(_.size)) should (be >= 1 and be <= 4)
it("should pick between 0 and 4 features") {
all(parsed.map(_.size)) should (be >= 0 and be <= 4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it is not checking very much... can you do a check for the conditions described above and then do the size check?

@AdamChit
Copy link
Collaborator Author

Why would it not pick any features?

So in the test the labels are generated based on the PickList feature:

        p.value match {
          case Some("A") | Some("B") | Some("C") => RealNN(1.0)
          case _ => RealNN(0.0)
        }
      )

And all the features have a chance of being empty:

      val countryData: Seq[Country] = RandomText.countries.withProbabilityOfEmpty(0.3).take(numRows).toList
      val pickListData: Seq[PickList] = RandomText.pickLists(domain = List("A", "B", "C", "D", "E", "F", "G"))
        .withProbabilityOfEmpty(0.1).limit(numRows)
      val currencyData: Seq[Currency] = RandomReal.logNormal[Currency](mean = 10.0, sigma = 1.0)
        .withProbabilityOfEmpty(0.3).limit(numRows)

One case that could happen:
2) All the feature are set to empty
3) LOCO changes the pickList from empty to non-empty and because it is not in the set {A,B,C} it is NOT a strong of a predictor
=> label wouldn't change => pickList would not have a strong insight => No feature is selected

Note: the chance of this happening is very low which explains why it doesn't always appear in the CI build.

@AdamChit AdamChit changed the title Achit/loco test bug fix loco test bug fix Oct 19, 2019
@tovbinm
Copy link
Collaborator

tovbinm commented Oct 19, 2019

@AdamChit are you still working on adding a more robust test?

@tovbinm
Copy link
Collaborator

tovbinm commented Jan 9, 2020

any updates? @AdamChit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants