-
Notifications
You must be signed in to change notification settings - Fork 265
Add performance tests for flatMap operation in OperationContainer #740
base: develop
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #740 +/- ##
==========================================
- Coverage 72.21% 70.1% -2.11%
==========================================
Files 153 158 +5
Lines 3736 3824 +88
Branches 204 205 +1
==========================================
- Hits 2698 2681 -17
- Misses 1038 1143 +105
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like composition in these operations is costly.
Note, the optimizer rules could have composed these things for us. I don't know why we wouldn't want to use them.
The main issue was that we didn't work out tracking the naming through the compositions correctly before. But if we can use the optimizer rules we can solve these problems once and use it in scalding, memory, concurrent memory, storm, etc...
} | ||
} | ||
while (left != 0) { | ||
Thread.sleep(TickDelay.toMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty skeptical we are comparing apples to apples when we compare this to non OperationContainers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical as well, but I wanted to see at least some number for non OperationContainer
execution. In the end I don't think it should be anyhow actionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread.sleep
adds noise to the result. The blackhole provides a method to simulate CPU usage. I think it's consumeCpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've just noticed that this is actually to check when it's completed. Maybe you could use a CountDownLatch
for it?
None | ||
|
||
override def execute(state: S,data: I): TraversableOnce[(Chain[S], Try[TraversableOnce[O]])] = | ||
Some((Chain.single(state), Success(f(data)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might put Try(f(data))
here just to catch any issues with the f
that could exist since we have try anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I'm fine with merging this, but should we add a test that the SimpleFlatMap is identical in performance to the IntermediateFlatMap? Someone might use it as is, and currently it is untested. Is this a candidate for running in our actual planner, or just a limit to shoot for? |
@johnynek According to tests |
Sorry, I meant identical in function, I realize it is much faster. |
@johnynek yeah =) issue is - it's not identical because our |
|
||
/** | ||
* Command to run (should be executed from sbt with project summingbird-online-perf): | ||
* jmh:run -i 10 -wi 10 -f1 -t1 ComposedFlatMapPerformance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding these options:
-prof gc
to enable the GC profiler
-gc
to force a GC between runs. Without it, 10 warmup and actual iterations might not be enough to exclude GC noise
-foe true
to fail the benchmark if the code fails. I've seen benchmarks that only test that the benchmark is wrong :)
-jvmArgs "-Xms1G -Xmx1G"
to make sure that the allocated memory is the same across different dev environments
@BenchmarkMode(Array(Mode.AverageTime)) | ||
@OutputTimeUnit(TimeUnit.MILLISECONDS) | ||
def measureDirect(bh: Blackhole): Unit = | ||
input.foreach(composed.apply(_).foreach(bh.consume)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function allocation here (composed.apply(_)
) introduces some noise. Maybe it's worth extracting it to a class val
var left = inputs.length | ||
for ( | ||
input <- inputs; | ||
emitted <- container.execute(InputState[Int](1), input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a sense of how expensive container.execute
is? I ask because testOperationContainer
introduces some noise so its overhead could dominate the final result.
@ttim we are starting to really push on summingbird performance. Any more work here? Would it make sense to have a video sync sometime soon as we are currently investing in summingbird at Stripe (but also considering if that makes sense or to shift investment somewhat). If we can share effort, that is a big win. If we each are just going to work privately on our forks maybe each should make their own call. Thoughts @pankajroark ? |
@johnynek Great to hear about Summingbird performance push. Happy to set up a video sync. As you've seen we've made some optimizations and we wanted to make more. Unfortunately we have to focus resources on other projects for the time being, but we plan to come back and make more optimizations. It will be great to learn about what you've planned. Let me sync up with you offline to set up a time for video sync. |
@pankajroark sounds good. I replied offline about a meeting time. |
This PR adds tests for
flatMap
/flatMap
composition in context ofOperationContainer
. Tests are implemented using JMH.Tests are not ideal, but they give us some estimation of costs our abstractions introduce.
Most interesting overheads:
IntermediateFlatMap
withFlatMapOperation
introduces about 1s per 1KK operation overhead.FlatMapOperation
composition adds another ~1-2s per 1KK.