Performance Optimizations #104

benjchristensen · 2013-01-21T22:18:05Z

Very large compositions have a performance hit as they increase the size of the stack and call onNext at each level of composition.

Pursue an initial round of optimizations that collapses the composition into a sequence of functions that get applied at subscribe() rather than subsequent calls of onNext.

UPDATED: See the comments below for where this evolved to. Chaining was shown through testing to not be the issue.

benjchristensen · 2013-01-21T22:18:44Z

Simple performance test to demonstrate performance issue:

package testing;

import java.util.concurrent.atomic.AtomicInteger;

import rx.observables.Observable;
import rx.observables.Observer;
import rx.util.Func1;

public class PerformanceTest {

    public static void main(String[] args) {
        PerformanceTest test = new PerformanceTest();
        Integer[] values = new Integer[100001];
        for (int i = 0; i < values.length; i++) {
            values[i] = i;
        }

        for (int i = 0; i < 100; i++) {
            System.out.println("-------------------------------");
            //            test.runFunctionExecutionTest(values);
            test.runCompositionTest(values);
            test.runNonCompositionalTestWithDirectLoop(values);
            test.runNonCompositionalTestWithArrayOfFunctions(values);
        }
    }

    public void runCompositionTest(Integer[] values) {
        System.out.println("runCompositionTest");

        final AtomicInteger onNextSum = new AtomicInteger(0);
        final long start = System.nanoTime();

        MathFunction m = new MathFunction();
        // 50 levels of composition (same function so that's not the cost)
        Observable.from(values)
                .map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m)
                .map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m)
                .map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m)
                .map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m)
                .map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m).map(m)
                .subscribe(new TestObserver(onNextSum, start));
    }

    public void runNonCompositionalTestWithDirectLoop(Integer[] values) {
        System.out.println("runNonCompositionalTestWithDirectLoop");

        final AtomicInteger onNextSum = new AtomicInteger(0);
        final long start = System.nanoTime();
        final MathFunction m = new MathFunction();

        Observable.from(values).map(new Func1<Integer, Integer>() {

            @Override
            public Integer call(Integer t1) {
                // iterate the 50 times here in a loop rather than via composition
                for (int i = 0; i < 50; i++) {
                    t1 = m.call(t1);
                }
                return t1;
            }

        }).subscribe(new TestObserver(onNextSum, start));

    }

    public void runNonCompositionalTestWithArrayOfFunctions(Integer[] values) {
        System.out.println("runNonCompositionalTestWithArrayOfFunctions");

        final AtomicInteger onNextSum = new AtomicInteger(0);
        final long start = System.nanoTime();
        final MathFunction m = new MathFunction();
        final Func1[] functionCalls = new Func1<?, ?>[50];
        for (int i = 0; i < 50; i++) {
            functionCalls[i] = m;
        }

        Observable.from(values).map(new Func1<Integer, Integer>() {

            @Override
            public Integer call(Integer t1) {
                // iterate the 50 times here in a loop rather than via composition
                for (Func1<Integer, Integer> f : functionCalls) {
                    t1 = f.call(t1);
                }
                return t1;
            }

        }).subscribe(new TestObserver(onNextSum, start));

    }

    private static final class TestObserver implements Observer<Integer> {
        private final AtomicInteger onNextSum;
        private final long start;

        private TestObserver(AtomicInteger onNextSum, long start) {
            this.onNextSum = onNextSum;
            this.start = start;
        }

        @Override
        public void onNext(Integer i) {
            onNextSum.addAndGet(i);
        }

        @Override
        public void onError(Exception e) {
            e.printStackTrace();
        }

        @Override
        public void onCompleted() {
            long end = System.nanoTime();
            System.out.println("Sum: " + onNextSum.get() + " Time: " + ((double) (end - start)) / 1000 / 1000 + "ms");
        }
    }

    private static class MathFunction implements Func1<Integer, Integer> {

        @Override
        public Integer call(Integer t1) {
            return t1 + 1;
        }

    }

}

benjchristensen · 2013-01-21T22:19:44Z

On existing code the composition of 50 'map' calls takes ~370ms whereas a single call with a list of the same 50 functions takes ~23ms.

runCompositionTest
Sum: 710082754 Time: 371.601ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 23.814ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 23.925ms
-------------------------------
runCompositionTest
Sum: 710082754 Time: 383.391ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 24.577ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 24.65ms
-------------------------------
runCompositionTest
Sum: 710082754 Time: 376.443ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 24.587ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 26.131ms

benjchristensen · 2013-01-21T22:20:59Z

Playing with different things I got the following numbers:


>>> Statically typed <<<

Without chaining:

Sum: 710082754 Time: 130.683ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 21.011ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 20.84ms


With chaining (composition collapsing):

Sum: 710082754 Time: 28.787ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 19.525ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 19.175ms


>>> Dynamically typed <<<

When going via generic Functions.execute even with chained sequence:

runCompositionTest
Sum: 710082754 Time: 577.3ms <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< very bad when dynamic
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 31.591ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 38.093ms
runCompositionTest


With Function memoization so we only pay dynamic price during sequence setup, not in onNext:


Using ArrayList

runCompositionTest
Sum: 710082754 Time: 27.078ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 18.911ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 18.852ms


Using LinkedBlockingQueue

runCompositionTest
Sum: 710082754 Time: 46.532ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 18.946ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 18.746ms

benjchristensen · 2013-01-21T22:23:06Z

By adding chaining (to collapse composition into a single observable with chain of functions) and improving the handling of dynamic function calls through memoization I get this:

runCompositionTest
Sum: 710082754 Time: 47.184ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 17.869ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 26.528ms
-------------------------------
runCompositionTest
Sum: 710082754 Time: 47.578ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 21.692ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 18.177ms
-------------------------------
runCompositionTest
Sum: 710082754 Time: 47.93ms
runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 18.821ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 20.064ms

I can get the composed version down to around 25-28ms if I use ArrayList but I need to be thread-safe in this case so am using ConcurrentLinkedQueue which adds a little time, but it's still far faster than before.

Same code now takes ~48ms instead of ~370ms.

benjchristensen · 2013-01-22T03:14:50Z

Pretty significant performance improvements came from improved handling of function execution - in particular memoizing the logic for constructing a function, particularly dynamic ones.

The long if/then conditional block for handling the many different Func/Action/language functions was very expensive in sequence with lots of data.

This was fixed in #106

I'm still testing the potential gains from chaining.

benjchristensen · 2013-01-22T18:03:17Z

While playing with chaining I found that the actual performance issues were the dynamic function construction/lookup and concurrency/synchronization costs - not the depth of stack.

Here is code that demonstrates that the stack depth is not a problem for performance. In fact, the composed version performs better than the loops:

package testing;

import java.util.ArrayList;
import java.util.concurrent.Callable;

import rx.util.Func1;

public class TestChainPerformance {

    public static void main(String[] args) {
        TestChainPerformance test = new TestChainPerformance();
        Integer[] values = new Integer[100001];
        for (int i = 0; i < values.length; i++) {
            values[i] = i;
        }

        try {
            for (int i = 0; i < 100; i++) {
                System.out.println("-------------------------------");
                test.runChained(values);
                test.runComposed(values);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    public void runComposed(final Integer[] values) throws Exception {
        long start = System.nanoTime();

        Callable<Integer> c = null;
        for (int i = 0; i < 250; i++) {
            final Callable<Integer> previousC = c;
            c = new Callable<Integer>() {

                @Override
                public Integer call() throws Exception {
                    MathFunction f = new MathFunction();
                    int sum = 0;
                    for (int v : values) {
                        sum += f.call(v);
                    }
                    if (previousC != null) {
                        sum += previousC.call();
                    }
                    return sum;
                }

            };
        }

        int sum = c.call();

        long end = System.nanoTime();
        System.out.println("Composed => Sum: " + sum + " Time: " + ((double) (end - start)) / 1000 / 1000 + "ms");
    }

    public void runChained(Integer[] values) {
        long start = System.nanoTime();
        int sum = 0;

        ArrayList<Func1<Integer, Integer>> functions = new ArrayList<Func1<Integer, Integer>>();
        for (int i = 0; i < 250; i++) {
            functions.add(new MathFunction());
        }

        for (int v : values) {
            for (Func1<Integer, Integer> f : functions) {
                sum += f.call(v);
            }
        }

        long end = System.nanoTime();
        System.out.println("Iterative => Sum: " + sum + " Time: " + ((double) (end - start)) / 1000 / 1000 + "ms");
    }

    private static class MathFunction implements Func1<Integer, Integer> {

        @Override
        public Integer call(Integer t1) {
            return t1 + 1;
        }

    }
}

The performance numbers on my machine are

Iterative => Sum: 202017114 Time: 227.288ms
Composed => Sum: 202017114 Time: **172.856ms**
-------------------------------
Iterative => Sum: 202017114 Time: 230.235ms
Composed => Sum: 202017114 Time: **173.917ms**
-------------------------------
Iterative => Sum: 202017114 Time: 226.337ms
Composed => Sum: 202017114 Time: **172.932ms**

That is with JDK 7. JDK 6 is slower but similar difference between the two.

Due to this I'm abandoning pursuit of chaining as a performance enhancement and focusing on functions and synchronization.

benjchristensen · 2013-01-22T23:45:48Z

In pull request #106 with function memoization we went from this:

Sum: 710082754 Time: 437.936ms

to this:

Sum: 710082754 Time: 60.605ms

By reducing the nested synchronization (eliminating most of it) it is now running at:

Sum: 710082754 Time: 22.429ms

benjchristensen · 2013-01-22T23:50:20Z

Performance is now very close to what it is without Rx:

Sum: 710082754 Time: 22.429ms

vs

runNonCompositionalTestWithDirectLoop
Sum: 710082754 Time: 17.248ms
runNonCompositionalTestWithArrayOfFunctions
Sum: 710082754 Time: 17.311ms

I am considering this overhead acceptable right now and not spending further time on this issue.

ReactiveX#216

…ation Performance optimizations for dynamic function execution. ReactiveX#104

- migrated Func1 to OperatorSubscribeFunction for internal operator implementations - do not wrap with AtomicObserver when it's a trusted operator ReactiveX#104

Performance optimizations for dynamic function execution. ReactiveX/RxJava#104

…tor methods.

ghost assigned benjchristensen Jan 21, 2013

This was referenced Jan 21, 2013

Performance optimizations. #105

Closed

Performance optimizations for dynamic function execution. #106

Merged

benjchristensen mentioned this issue Jan 22, 2013

Performance: Reduce synchronization #107

Merged

benjchristensen closed this as completed Jan 22, 2013

benjchristensen mentioned this issue Apr 1, 2013

Guideline 6.4: Protect calls to user code from within an operator #216

Closed

benjchristensen referenced this issue in benjchristensen/RxJava Apr 1, 2013

Improve error handling of user provided Observers

38e9e3f

ReactiveX#216

rickbw pushed a commit to rickbw/RxJava that referenced this issue Jan 9, 2014

Merge pull request ReactiveX#106 from benjchristensen/function-memoiz…

2fd22df

…ation Performance optimizations for dynamic function execution. ReactiveX#104

benjchristensen added a commit to ReactiveX/RxScala that referenced this issue Aug 19, 2014

Merge pull request #106 from benjchristensen/function-memoization

3a442c6

Performance optimizations for dynamic function execution. ReactiveX/RxJava#104

jihoonson pushed a commit to jihoonson/RxJava that referenced this issue Mar 6, 2020

Issue ReactiveX#104 Get rid of code duplication in @deprecated decora…

b751ff0

…tor methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Optimizations #104

Performance Optimizations #104

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 22, 2013

benjchristensen commented Jan 22, 2013

benjchristensen commented Jan 22, 2013

benjchristensen commented Jan 22, 2013

Performance Optimizations #104

Performance Optimizations #104

Comments

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 21, 2013

benjchristensen commented Jan 22, 2013

benjchristensen commented Jan 22, 2013

benjchristensen commented Jan 22, 2013

benjchristensen commented Jan 22, 2013