-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training.py _slice_arrays() fix crash when arrays are None #7069
Conversation
LGTM (albeit a better solution may have been to avoid calling |
My reasoning was if another call is added to _slice_arrays, it is easy to forget When speed is a priority I think larger benefits could come from the following, in order from most to least practical:
That'd do more than moving any single if statement! Sadly, that's also much too time consuming for me at the moment. Perhaps the most practical option 1, profiling + vectorizing, is a good one for the projects board! :-) |
If you time it, there is a consistent 3% time increase when using the new function. However that's 3% of something small, surrounded by many things that are much more computationally intensive. Overall I don't think it's a big deal. |
Sorry, more like 5%. import numpy as np
import time
arrs = [np.random.random((100000, 10)),
np.random.random((100000, 10)),
np.random.random((100000, 10))]
def _slice_arrays_1(arrays, start=None, stop=None):
"""Slice an array or list of arrays.
This takes an array-like, or a list of
array-likes, and outputs:
- arrays[start:stop] if `arrays` is an array-like
- [x[start:stop] for x in arrays] if `arrays` is a list
Can also work on list/array of indices: `_slice_arrays(x, indices)`
# Arguments
arrays: Single array or list of arrays.
start: can be an integer index (start index)
or a list/array of indices
stop: integer (stop index); should be None if
`start` was a list.
# Returns
A slice of the array(s).
"""
if isinstance(arrays, list):
if hasattr(start, '__len__'):
# hdf5 datasets only support list objects as indices
if hasattr(start, 'shape'):
start = start.tolist()
return [x[start] for x in arrays]
else:
return [x[start:stop] for x in arrays]
else:
if hasattr(start, '__len__'):
if hasattr(start, 'shape'):
start = start.tolist()
return arrays[start]
else:
return arrays[start:stop]
def _slice_arrays_2(arrays, start=None, stop=None):
"""Slice an array or list of arrays.
This takes an array-like, or a list of
array-likes, and outputs:
- arrays[start:stop] if `arrays` is an array-like
- [x[start:stop] for x in arrays] if `arrays` is a list
Can also work on list/array of indices: `_slice_arrays(x, indices)`
# Arguments
arrays: Single array or list of arrays.
start: can be an integer index (start index)
or a list/array of indices
stop: integer (stop index); should be None if
`start` was a list.
# Returns
A slice of the array(s).
"""
if arrays is None:
return [None]
elif isinstance(arrays, list):
if hasattr(start, '__len__'):
# hdf5 datasets only support list objects as indices
if hasattr(start, 'shape'):
start = start.tolist()
return [None if x is None else x[start] for x in arrays]
else:
return [None if x is None else x[start:stop] for x in arrays]
else:
if hasattr(start, '__len__'):
if hasattr(start, 'shape'):
start = start.tolist()
return arrays[start]
elif hasattr(start, '__getitem__'):
return arrays[start:stop]
else:
return [None]
f1_times = []
f2_times = []
for _ in range(20):
t0 = time.time()
for _ in range(100000):
_slice_arrays_1(arrs, start=3, stop=13)
t1 = time.time()
f1_times.append(t0 - t1)
t0 = time.time()
for _ in range(100000):
_slice_arrays_2(arrs, start=3, stop=13)
t1 = time.time()
f2_times.append(t0 - t1)
f1 = np.mean(f1_times)
f2 = np.mean(f2_times)
print(f1)
print(f2)
print(100. * (f2 - f1) / f1) |
Oh for that one function call! Yeah that makes sense, I agree. I was thinking about the typical overall main program loop like you described later on. I do think it is a good idea to consider adding "training performance improvements with validating unit tests and unchanged program behavior" to the help wanted board. I tried out Numba and it looks very nice, but I think it won't be easy to incorporate until the next release which claims to support list comprehensions. If that could provide a sufficiently substantial boost to the overall performance it might be worth considering. Installation was fine with pip. |
atomic component of #6928 and #7046