training.py _slice_arrays() fix crash when arrays are None #7069

ahundt · 2017-06-21T04:56:12Z

atomic component of #6928 and #7046

fchollet · 2017-06-22T01:01:25Z

LGTM (albeit a better solution may have been to avoid calling slice when we have a None instead of an array, which would have made the logic of slice lighter (one less if per array), which is somewhat important given that it is called repeatedly inside of the training loop -- but that's not a blocker because the performance difference will not be noticeable).

ahundt · 2017-06-22T03:09:31Z

My reasoning was if another call is added to _slice_arrays, it is easy to forget None. Performance is something I've spent years on... Python surely isn't C++. :-)

When speed is a priority I think larger benefits could come from the following, in order from most to least practical:

more vectorizing to numpy where possible
- there are several opportunities I noticed like with boolean logic arrays
- every python list loop has many if statements!
Pushing more to the backend sooner and keeping it there, like image preprocessing
Cythonize or use Numba for any really tight loops where the first two aren't good enough (downside: quite an involved dependency)

That'd do more than moving any single if statement! Sadly, that's also much too time consuming for me at the moment. Perhaps the most practical option 1, profiling + vectorizing, is a good one for the projects board! :-)

fchollet · 2017-06-22T19:52:46Z

If you time it, there is a consistent 3% time increase when using the new function. However that's 3% of something small, surrounded by many things that are much more computationally intensive. Overall I don't think it's a big deal.

fchollet · 2017-06-22T19:56:25Z

Sorry, more like 5%.

import numpy as np
import time

arrs = [np.random.random((100000, 10)),
        np.random.random((100000, 10)),
        np.random.random((100000, 10))]


def _slice_arrays_1(arrays, start=None, stop=None):
    """Slice an array or list of arrays.
    This takes an array-like, or a list of
    array-likes, and outputs:
        - arrays[start:stop] if `arrays` is an array-like
        - [x[start:stop] for x in arrays] if `arrays` is a list
    Can also work on list/array of indices: `_slice_arrays(x, indices)`
    # Arguments
        arrays: Single array or list of arrays.
        start: can be an integer index (start index)
            or a list/array of indices
        stop: integer (stop index); should be None if
            `start` was a list.
    # Returns
        A slice of the array(s).
    """
    if isinstance(arrays, list):
        if hasattr(start, '__len__'):
            # hdf5 datasets only support list objects as indices
            if hasattr(start, 'shape'):
                start = start.tolist()
            return [x[start] for x in arrays]
        else:
            return [x[start:stop] for x in arrays]
    else:
        if hasattr(start, '__len__'):
            if hasattr(start, 'shape'):
                start = start.tolist()
            return arrays[start]
        else:
            return arrays[start:stop]


def _slice_arrays_2(arrays, start=None, stop=None):
    """Slice an array or list of arrays.
    This takes an array-like, or a list of
    array-likes, and outputs:
        - arrays[start:stop] if `arrays` is an array-like
        - [x[start:stop] for x in arrays] if `arrays` is a list
    Can also work on list/array of indices: `_slice_arrays(x, indices)`
    # Arguments
        arrays: Single array or list of arrays.
        start: can be an integer index (start index)
            or a list/array of indices
        stop: integer (stop index); should be None if
            `start` was a list.
    # Returns
        A slice of the array(s).
    """
    if arrays is None:
        return [None]
    elif isinstance(arrays, list):
        if hasattr(start, '__len__'):
            # hdf5 datasets only support list objects as indices
            if hasattr(start, 'shape'):
                start = start.tolist()
            return [None if x is None else x[start] for x in arrays]
        else:
            return [None if x is None else x[start:stop] for x in arrays]
    else:
        if hasattr(start, '__len__'):
            if hasattr(start, 'shape'):
                start = start.tolist()
            return arrays[start]
        elif hasattr(start, '__getitem__'):
            return arrays[start:stop]
        else:
            return [None]


f1_times = []
f2_times = []
for _ in range(20):
    t0 = time.time()
    for _ in range(100000):
        _slice_arrays_1(arrs, start=3, stop=13)
    t1 = time.time()
    f1_times.append(t0 - t1)

    t0 = time.time()
    for _ in range(100000):
        _slice_arrays_2(arrs, start=3, stop=13)
    t1 = time.time()
    f2_times.append(t0 - t1)

f1 = np.mean(f1_times)
f2 = np.mean(f2_times)
print(f1)
print(f2)
print(100. * (f2 - f1) / f1)

ahundt · 2017-06-22T22:02:19Z

Oh for that one function call! Yeah that makes sense, I agree. I was thinking about the typical overall main program loop like you described later on.

I do think it is a good idea to consider adding "training performance improvements with validating unit tests and unchanged program behavior" to the help wanted board.

I tried out Numba and it looks very nice, but I think it won't be easy to incorporate until the next release which claims to support list comprehensions. If that could provide a sufficiently substantial boost to the overall performance it might be worth considering. Installation was fine with pip.

training.py _slice_arrays() fix crash when arrays are None

8f555d8

ahundt mentioned this pull request Jun 21, 2017

reorganize training.py, code is clearer and handles more corner cases… #7072

Closed

ahundt added 2 commits June 21, 2017 02:06

training.py test _slice_arrays()

4cd132a

Merge branch 'master' into slice_arrays

ab639ca

fchollet merged commit 1b53999 into keras-team:master Jun 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training.py _slice_arrays() fix crash when arrays are None #7069

training.py _slice_arrays() fix crash when arrays are None #7069

ahundt commented Jun 21, 2017 •

edited

Loading

fchollet commented Jun 22, 2017

ahundt commented Jun 22, 2017 •

edited

Loading

fchollet commented Jun 22, 2017 •

edited

Loading

fchollet commented Jun 22, 2017

ahundt commented Jun 22, 2017

training.py _slice_arrays() fix crash when arrays are None #7069

training.py _slice_arrays() fix crash when arrays are None #7069

Conversation

ahundt commented Jun 21, 2017 • edited Loading

fchollet commented Jun 22, 2017

ahundt commented Jun 22, 2017 • edited Loading

fchollet commented Jun 22, 2017 • edited Loading

fchollet commented Jun 22, 2017

ahundt commented Jun 22, 2017

ahundt commented Jun 21, 2017 •

edited

Loading

ahundt commented Jun 22, 2017 •

edited

Loading

fchollet commented Jun 22, 2017 •

edited

Loading