You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If set use_numpy=True, columnar=True and the data to insert > insert_block_size(default 1048576), the data will not be sent.
After some effort looking into the source code, I found this is the bug of clickhouse_driver.numpy.helpers.column_chunks
the original code:
defcolumn_chunks(columns, n):
forcolumnincolumns:
ifnotisinstance(column, (np.ndarray, pd.DatetimeIndex)):
raiseTypeError(
'Unsupported column type: {}. ''ndarray/DatetimeIndex is expected.'
.format(type(column))
)
# create chunk generator for every columnchunked= [
iter(np.array_split(c, range(0, len(c), n)) iflen(c) >nelse [c])
forcincolumns
]
whileTrue:
# get next chunk for every columnitem= [next(column, []) forcolumninchunked]
ifnotany(len(x) forxinitem):
breakyielditem
at line: np.array_split(...) range(0, len(c), n) is an indices starting from 0 np.array_split source code:
@array_function_dispatch(_array_split_dispatcher)defarray_split(ary, indices_or_sections, axis=0):
""" Split an array into multiple sub-arrays. Please refer to the ``split`` documentation. The only difference between these functions is that ``array_split`` allows `indices_or_sections` to be an integer that does *not* equally divide the axis. For an array of length l that should be split into n sections, it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n. See Also -------- split : Split array into multiple sub-arrays of equal size. Examples -------- >>> x = np.arange(8.0) >>> np.array_split(x, 3) [array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7.])] >>> x = np.arange(9) >>> np.array_split(x, 4) [array([0, 1, 2]), array([3, 4]), array([5, 6]), array([7, 8])] """try:
Ntotal=ary.shape[axis]
exceptAttributeError:
Ntotal=len(ary)
try:
# handle array case.Nsections=len(indices_or_sections) +1div_points= [0] +list(indices_or_sections) + [Ntotal]
exceptTypeError:
# indices_or_sections is a scalar, not an array.Nsections=int(indices_or_sections)
ifNsections<=0:
raiseValueError('number sections must be larger than 0.')
Neach_section, extras=divmod(Ntotal, Nsections)
section_sizes= ([0] +extras* [Neach_section+1] +
(Nsections-extras) * [Neach_section])
div_points=_nx.array(section_sizes, dtype=_nx.intp).cumsum()
sub_arys= []
sary=_nx.swapaxes(ary, axis, 0)
foriinrange(Nsections):
st=div_points[i]
end=div_points[i+1]
sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))
returnsub_arys
at line div_points = [0] + list(indices_or_sections) + [Ntotal]
the final div_points will have 2 zeros at its start, e.g [0, 0, 100, 200...]
so the the first chunk will always be empty
then it goes to:
ifnotany(len(x) forxinitem):
break
so this numpy slicer will return nothing and stop the inserting process.
If set
use_numpy=True
,columnar=True
and the data to insert >insert_block_size
(default 1048576), the data will not be sent.After some effort looking into the source code, I found this is the bug of
clickhouse_driver.numpy.helpers.column_chunks
the original code:
at line:
np.array_split(...)
range(0, len(c), n)
is an indices starting from 0np.array_split
source code:at line
div_points = [0] + list(indices_or_sections) + [Ntotal]
the final
div_points
will have 2 zeros at its start, e.g[0, 0, 100, 200...]
so the the first chunk will always be empty
then it goes to:
so this numpy slicer will return nothing and stop the inserting process.
I suggest to change
to
The text was updated successfully, but these errors were encountered: