BUG: .numpy.helpers.column_chunks never yield chunks for len > insert_block_size #215

auderson · 2021-05-13T02:26:46Z

If set use_numpy=True, columnar=True and the data to insert > insert_block_size(default 1048576), the data will not be sent.
After some effort looking into the source code, I found this is the bug of clickhouse_driver.numpy.helpers.column_chunks
the original code:

def column_chunks(columns, n):
    for column in columns:
        if not isinstance(column, (np.ndarray, pd.DatetimeIndex)):
            raise TypeError(
                'Unsupported column type: {}. '
                'ndarray/DatetimeIndex is expected.'
                .format(type(column))
            )

    # create chunk generator for every column
    chunked = [
        iter(np.array_split(c, range(0, len(c), n)) if len(c) > n else [c])
        for c in columns
    ]

    while True:
        # get next chunk for every column
        item = [next(column, []) for column in chunked]
        if not any(len(x) for x in item):
            break
        yield item

at line: np.array_split(...)
range(0, len(c), n) is an indices starting from 0
np.array_split source code:

@array_function_dispatch(_array_split_dispatcher)
def array_split(ary, indices_or_sections, axis=0):
    """
    Split an array into multiple sub-arrays.

    Please refer to the ``split`` documentation.  The only difference
    between these functions is that ``array_split`` allows
    `indices_or_sections` to be an integer that does *not* equally
    divide the axis. For an array of length l that should be split
    into n sections, it returns l % n sub-arrays of size l//n + 1
    and the rest of size l//n.

    See Also
    --------
    split : Split array into multiple sub-arrays of equal size.

    Examples
    --------
    >>> x = np.arange(8.0)
    >>> np.array_split(x, 3)
    [array([0.,  1.,  2.]), array([3.,  4.,  5.]), array([6.,  7.])]

    >>> x = np.arange(9)
    >>> np.array_split(x, 4)
    [array([0, 1, 2]), array([3, 4]), array([5, 6]), array([7, 8])]

    """
    try:
        Ntotal = ary.shape[axis]
    except AttributeError:
        Ntotal = len(ary)
    try:
        # handle array case.
        Nsections = len(indices_or_sections) + 1
        div_points = [0] + list(indices_or_sections) + [Ntotal]
    except TypeError:
        # indices_or_sections is a scalar, not an array.
        Nsections = int(indices_or_sections)
        if Nsections <= 0:
            raise ValueError('number sections must be larger than 0.')
        Neach_section, extras = divmod(Ntotal, Nsections)
        section_sizes = ([0] +
                         extras * [Neach_section+1] +
                         (Nsections-extras) * [Neach_section])
        div_points = _nx.array(section_sizes, dtype=_nx.intp).cumsum()

    sub_arys = []
    sary = _nx.swapaxes(ary, axis, 0)
    for i in range(Nsections):
        st = div_points[i]
        end = div_points[i + 1]
        sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))

    return sub_arys

at line div_points = [0] + list(indices_or_sections) + [Ntotal]
the final div_points will have 2 zeros at its start, e.g [0, 0, 100, 200...]
so the the first chunk will always be empty
then it goes to:

if not any(len(x) for x in item):
    break

so this numpy slicer will return nothing and stop the inserting process.

I suggest to change

    chunked = [
        iter(np.array_split(c, range(0, len(c), n)) if len(c) > n else [c])
        for c in columns
    ]

to

    chunked = [
        iter(np.array_split(c, len(c) // n) if len(c) > n else [c])
        for c in columns
    ]

The text was updated successfully, but these errors were encountered:

avlasenkoev · 2021-06-14T14:35:37Z

I have the same problem... The topic author described it correct

xzkostyan · 2021-09-24T20:21:35Z

The same issue: #243. Fix was merged into master and released in version 0.2.2.

xzkostyan closed this as completed Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: .numpy.helpers.column_chunks never yield chunks for len > insert_block_size #215

BUG: .numpy.helpers.column_chunks never yield chunks for len > insert_block_size #215

auderson commented May 13, 2021 •

edited

Loading

avlasenkoev commented Jun 14, 2021

xzkostyan commented Sep 24, 2021

BUG: .numpy.helpers.column_chunks never yield chunks for len > insert_block_size #215

BUG: .numpy.helpers.column_chunks never yield chunks for len > insert_block_size #215

Comments

auderson commented May 13, 2021 • edited Loading

avlasenkoev commented Jun 14, 2021

xzkostyan commented Sep 24, 2021

auderson commented May 13, 2021 •

edited

Loading