Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Add a DagOptimizer test #745

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Conversation

johnynek
Copy link
Collaborator

We are using the DagOptimizer at stripe before planning to reduce the size of some online graphs (went from 115 storm bolts or so to 69 in one example).

However, even in the case where we reach 69, there are rules that don't seem to be fully applied and I have not yet found out why.

Anyway, more test coverage never hurts.

@johnynek johnynek requested a review from ttim August 19, 2017 21:29
@johnynek
Copy link
Collaborator Author

@ttim can you take a look?

@codecov-io
Copy link

codecov-io commented Aug 19, 2017

Codecov Report

Merging #745 into develop will increase coverage by 0.1%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop     #745     +/-   ##
==========================================
+ Coverage    72.23%   72.34%   +0.1%     
==========================================
  Files          154      154             
  Lines         3742     3742             
  Branches       209      209             
==========================================
+ Hits          2703     2707      +4     
+ Misses        1039     1035      -4
Impacted Files Coverage Δ
...witter/summingbird/scalding/ScaldingPlatform.scala 76.19% <0%> (+0.59%) ⬆️
.../main/scala/com/twitter/summingbird/Producer.scala 77.27% <0%> (+1.51%) ⬆️
...a/com/twitter/summingbird/scalding/LoopState.scala 75% <0%> (+25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f06d22...39fa04c. Read the comment docs.

@johnynek
Copy link
Collaborator Author

okay, this now fails for me (both the fanOut and the idempotency test).

cc @non

@johnynek
Copy link
Collaborator Author

actually, I can't get idempotency to fail now... maybe it is just fanOut being broken being the issue, which some of the rules use...

@johnynek
Copy link
Collaborator Author

Okay, I don't see a bug actually. It was an issue with the test actually, and an inconsistency between how fanOut was defined in Dependants and ExpressionDag.

This test actually seems to pass for me now. I'll try to find any counter example next week, but so far, I guess everything looks correct still.

@johnynek
Copy link
Collaborator Author

okay, idempotency failure:

 arg0 = Summer(IdentityKeyedProducer(NamedProducer(IdentityKeyedProducer(MergedProducer(IdentityKeyedProducer(FlatMappedProducer(LeftJoinedProducer(IdentityKeyedProducer(NamedProducer(IdentityKeyedProducer(NamedProducer(IdentityKeyedProducer(OptionMappedProducer(IdentityKeyedProducer(FlatMappedProducer(Source(List(-483916215, 1947511647, -1, 2147483647, -1094725117, -1107074050, 1316607071, -894421428, 438074163, -1618870436, 2147483647, -927761159, 1000599281, 1, 2147483647, 1, -941762984, 1, -2147483648, 1134838592, -2147483648, -1360150793, -1774434453, -1968350011, -1999846660, 1, 940697649, -561677987, 1326923701, 734987219, 0, -1731750202, 782845389, 0, 158805678, -1, 1, -560692419, -1, 879491512, 1, 685151277, 333970011, -2147483648, 961509966, -2147483648, -1, 870342926, 2051572436, 2147483647, -178556721, 0, -1028446598, -2147483648, 1, 2147483647, 1, -1915898801, 1834110173, 519019579, 2147483647, 0, 1514822104, -1, -877887704, 2147483647, -224941626, -675798454, 1, 2092027473, -487281773, 638669148, -2147483648, 1, 2147483647, -2147483648, 1, -567098335, -1, 795549827, -1995849024, -577239589, 1867920629, 1, 0, -1839762855, 0, 1650235957, -385664255, 1676297418, 2147483647, -1701772912, -1, -1, 2147483647, 330749634, 1, 2147483647, -2147483648, 1462163691)),org.scalacheck.GenArities$$Lambda$3183/1815517983@3d533ae3)),org.scalacheck.GenArities$$Lambda$3183/1815517983@368308bf)),tjiposzOlkplcu)),tvpwpdyScehGnwcaVjjWvlfuwxatxhdjhozscucpbq)),Map(1122506458 -> -422595330, 985940285 -> 1773535903, -1012102957 -> 577191710, 1865284784 -> 0, 1133011483 -> -1, -1571538327 -> -106754684, -883421086 -> -1843831487, -1186741108 -> -1297188435, 511615026 -> 892838107, 2147483647 -> 0, -1 -> 0, 654842167 -> 2147483647, 766280486 -> 0, 46050994 -> -2147483648, -146388742 -> -1080181663, 1883141387 -> 2147483647, -1170327451 -> 41927314, -1265823639 -> 2038185509, -1531812404 -> -1, -1426771473 -> 0, 107420900 -> 1, -778321474 -> -613855929, 730121132 -> -1824061783, -193486826 -> 0, -1603389086 -> -1, -1663776702 -> 2025651619, 1791509260 -> -1, -249195648 -> 0, 1795996912 -> 470882984, -1635961338 -> 995590274, -1315404816 -> 1, 303482491 -> 1, -1120629917 -> -913091299, 1135859794 -> 0, 1 -> 431843073, -1055837163 -> -428420920, 1920175445 -> -2147483648, -248886982 -> -2003982440, 1518754224 -> -1, 796959542 -> 1, -1658050660 -> -1870329784, 1129081426 -> -1878450200, 992361458 -> -1, -315984592 -> 1918575397, -1137239383 -> 1, -102255667 -> -1, -734906869 -> -611180863, -473252927 -> 2147483647, 886258153 -> 1, -916284450 -> 1, 1559234564 -> -1740554658, -2147483648 -> -1363669838, 0 -> -1)),org.scalacheck.GenArities$$Lambda$3183/1815517983@2bb51e67)),IdentityKeyedProducer(OptionMappedProducer(IdentityKeyedProducer(FlatMappedProducer(Source(List(-483916215, 1947511647, -1, 2147483647, -1094725117, -1107074050, 1316607071, -894421428, 438074163, -1618870436, 2147483647, -927761159, 1000599281, 1, 2147483647, 1, -941762984, 1, -2147483648, 1134838592, -2147483648, -1360150793, -1774434453, -1968350011, -1999846660, 1, 940697649, -561677987, 1326923701, 734987219, 0, -1731750202, 782845389, 0, 158805678, -1, 1, -560692419, -1, 879491512, 1, 685151277, 333970011, -2147483648, 961509966, -2147483648, -1, 870342926, 2051572436, 2147483647, -178556721, 0, -1028446598, -2147483648, 1, 2147483647, 1, -1915898801, 1834110173, 519019579, 2147483647, 0, 1514822104, -1, -877887704, 2147483647, -224941626, -675798454, 1, 2092027473, -487281773, 638669148, -2147483648, 1, 2147483647, -2147483648, 1, -567098335, -1, 795549827, -1995849024, -577239589, 1867920629, 1, 0, -1839762855, 0, 1650235957, -385664255, 1676297418, 2147483647, -1701772912, -1, -1, 2147483647, 330749634, 1, 2147483647, -2147483648, 1462163691)),org.scalacheck.GenArities$$Lambda$3183/1815517983@3d533ae3)),org.scalacheck.GenArities$$Lambda$3183/1815517983@368308bf)))),ncn)),Map(),com.twitter.algebird.IntRing$@4deb204c),

 arg1 = com.twitter.summingbird.planner.DagOptimizer$RemoveNames$@7d710ac9.orElse(com.twitter.summingbird.planner.DagOptimizer$RemoveIdentityKeyed$@2a6e2829).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatMapFusion$@728d18e6).orElse(com.twitter.summingbird.planner.DagOptimizer$OptionMapFusion$@5dba444a).orElse(com.twitter.summingbird.planner.DagOptimizer$OptionToFlatMap$@28e82a76).orElse(com.twitter.summingbird.planner.DagOptimizer$KeyFlatMapToFlatMap$@731c6022).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatMapKeyFusion$@69cc1c8f).orElse(com.twitter.summingbird.planner.DagOptimizer$ValueFlatMapToFlatMap$@66db9de8).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatMapValuesFusion$@f09a0df).orElse(com.twitter.summingbird.planner.DagOptimizer$FlatThenOptionFusion$@1f04dc6c).orElse(com.twitter.summingbird.planner.DagOptimizer$DiamondToFlatMap$@16258367).orElse(com.twitter.summingbird.planner.DagOptimizer$MergePullUp$@2c2b7896).orElse(com.twitter.summingbird.planner.DagOptimizer$AlsoPullUp$@6ee63a94)

)

I'll try to reproduce that failure and fix.

There was a subtle bug in ExpressionDag.fanOut which caused
non-idempotency, and underapplication of rules in some cases. The
problem was in computing fanOut in Id space, where there is actually no
1:1 function between Ids and N nodes. The solution is to compute fanOut
directly in N space which is what is meaningful anyway.

This seems to fix the bug even with rather large numbers of trials
@johnynek
Copy link
Collaborator Author

This seems to fix the issue.

The bug was with fanOut. It was working in the Expr[_, _] and Id[_] space, which is not meaningful to the rules, which operate on the original N[_] space. The problem is that rewrites can give the same node two different Ids. When that happens, it looks like the fanOut is greater than it is, and this can prevent rules from applying.

If you jump back to the N[_] space, and then back into the optimizer, you reset the Ids and you may have a chance that the rule would apply then. This seems to be the problem.

Along the way I cleaned things up slightly.

Can you take a look @ttim

@johnynek
Copy link
Collaborator Author

had a storm flake. Restarted a 2.12 build.

@johnynek
Copy link
Collaborator Author

I'd like to cherry pick this when we merge and publish a version of 0.10.1 which includes this fix. We are using the optimizer and it would be helpful to us.

Also, I'm considering breaking this code out into a standalone library. I copied it into scalding, but it is inconvenient to have to publish these large projects to update this totally generic DAG rewriting tool.

Copy link
Collaborator

@piyushnarang piyushnarang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I'm not super familiar with the code, might be nice to get another pair of eyes on this.


implicit val generatorDrivenConfig =
PropertyCheckConfig(minSuccessful = 1000, maxDiscarded = 1000) // the producer generator uses filter, I think
//PropertyCheckConfig(minSuccessful = 100, maxDiscarded = 1000) // the producer generator uses filter, I think
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?


}

test("test some idempotency specific past failures") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe nice to add a comment above on the details of the past failure? might be hard for folks reading the code to know?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to say. This was a hand minimized example (it took me about an hour) from a failure case found by scalacheck. Since the failures were quite rare, it took a long time to even find a failure with scalacheck, so once I found it, I wanted to test that failure every time.

That's what I mean by "specific past failures".

I understand (somewhat) why this failed now, but I couldn't easily generate another that would also show the bug.

Can you suggest some specific text you would like to see me add?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess reading the title it doesn't give you a sense of what the issue is and what it's testing. Would it make sense to add a tldr of your understanding of why it failed?

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2019

CLA assistant check
All committers have signed the CLA.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants