Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet writer dictionary encoding refactor #8476

Merged

Commits on Mar 30, 2021

  1. Configuration menu
    Copy the full SHA
    32dce41 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d922868 View commit details
    Browse the repository at this point in the history

Commits on Mar 31, 2021

  1. zero->zeroed

    harrism committed Mar 31, 2021
    Configuration menu
    Copy the full SHA
    10c8b38 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2021

  1. uvector in dictionary

    devavret committed Apr 2, 2021
    Configuration menu
    Copy the full SHA
    404ddc4 View commit details
    Browse the repository at this point in the history
  2. uvector in stats

    devavret committed Apr 2, 2021
    Configuration menu
    Copy the full SHA
    dff6e5d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    48618ce View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2021

  1. Configuration menu
    Copy the full SHA
    361878e View commit details
    Browse the repository at this point in the history
  2. uvector in read parquet

    devavret committed Apr 5, 2021
    Configuration menu
    Copy the full SHA
    edcd6d7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a838118 View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2021

  1. fragment update in stats

    devavret committed Apr 6, 2021
    Configuration menu
    Copy the full SHA
    7b446ab View commit details
    Browse the repository at this point in the history
  2. spans in frag stats

    devavret committed Apr 6, 2021
    Configuration menu
    Copy the full SHA
    5685584 View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2021

  1. 2dvector for chunks part 1

    devavret committed Apr 7, 2021
    Configuration menu
    Copy the full SHA
    a471bdb View commit details
    Browse the repository at this point in the history
  2. 2d span for chunks

    devavret committed Apr 7, 2021
    Configuration menu
    Copy the full SHA
    aa43df2 View commit details
    Browse the repository at this point in the history

Commits on Apr 8, 2021

  1. flat span for chunk part 1

    devavret committed Apr 8, 2021
    Configuration menu
    Copy the full SHA
    f1639c4 View commit details
    Browse the repository at this point in the history
  2. chunks flat span part 2

    devavret committed Apr 8, 2021
    Configuration menu
    Copy the full SHA
    34952e0 View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2021

  1. span for pages part 1

    devavret committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    bf3bdd3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bb169bb View commit details
    Browse the repository at this point in the history
  3. spans for columndesc

    devavret committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    4a02435 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2021

  1. chunk span for dictionary

    devavret committed Apr 12, 2021
    Configuration menu
    Copy the full SHA
    89cc637 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0ca82ea View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2021

  1. Clean up function arguments

    devavret committed Apr 13, 2021
    Configuration menu
    Copy the full SHA
    ab656d8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    08ab80b View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2021

  1. Configuration menu
    Copy the full SHA
    421e909 View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2021

  1. Configuration menu
    Copy the full SHA
    920ba7b View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2021

  1. Configuration menu
    Copy the full SHA
    3c050bb View commit details
    Browse the repository at this point in the history
  2. review fix

    devavret committed Apr 19, 2021
    Configuration menu
    Copy the full SHA
    765b166 View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2021

  1. Configuration menu
    Copy the full SHA
    9a2789f View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2021

  1. Misc review fixes

    devavret committed Apr 29, 2021
    Configuration menu
    Copy the full SHA
    37e8952 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a02c749 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b9f49c4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    90d3c7f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b61dc48 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2021

  1. Remove start_page from last remaining kernel

    Subspanning page stats to allow passing only the stats corresponding to batch pages
    devavret committed Apr 30, 2021
    Configuration menu
    Copy the full SHA
    36ed3ad View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a65eef7 View commit details
    Browse the repository at this point in the history

Commits on May 3, 2021

  1. Configuration menu
    Copy the full SHA
    3dc3d44 View commit details
    Browse the repository at this point in the history

Commits on May 5, 2021

  1. Fixed the issue with hash and initializer.

    Hash was returning bool so all values cluttered in the beginning of map
    Initializer wasn't initializing more than first 1024 values.
    devavret committed May 5, 2021
    Configuration menu
    Copy the full SHA
    638b554 View commit details
    Browse the repository at this point in the history

Commits on May 14, 2021

  1. Configuration menu
    Copy the full SHA
    ac2173e View commit details
    Browse the repository at this point in the history

Commits on May 18, 2021

  1. tested large num uniq

    Enabled early return
    increased size of hash map storage from 1<<17 to chunk.num_values
    devavret committed May 18, 2021
    Configuration menu
    Copy the full SHA
    f8febb3 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2021

  1. Pull from cuco

    devavret committed May 19, 2021
    Configuration menu
    Copy the full SHA
    ad64143 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b223cc4 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2021

  1. Add dictionary compaction

    devavret committed May 20, 2021
    Configuration menu
    Copy the full SHA
    17acb35 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2021

  1. Configuration menu
    Copy the full SHA
    3d3ea90 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8376ae4 View commit details
    Browse the repository at this point in the history

Commits on May 24, 2021

  1. Configuration menu
    Copy the full SHA
    214b756 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6d97224 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2021

  1. Configuration menu
    Copy the full SHA
    0152f88 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2021

  1. Disable dict for bool cols

    devavret committed Jun 2, 2021
    Configuration menu
    Copy the full SHA
    b04fb7b View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2021

  1. Configuration menu
    Copy the full SHA
    e3093d1 View commit details
    Browse the repository at this point in the history
  2. Fix dict_index writing.

    Earlier all blocks overwrote into the same 0 to 5000 rows.
    Notable change is in line 334
    devavret committed Jun 3, 2021
    Configuration menu
    Copy the full SHA
    5bc604e View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2021

  1. Configuration menu
    Copy the full SHA
    1939ce5 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2021

  1. Complete replacing old dict code with new

    - Disabled old dict kernels
    - Removed dict bits calc from page init, returned it from host decision
    - Found and fixed bug in dict_index indexing (val_idx was in page instead of chunk)
    - Replaced all dict related old members usage with new members
    devavret committed Jun 8, 2021
    Configuration menu
    Copy the full SHA
    bda722f View commit details
    Browse the repository at this point in the history
  2. Clenup dict_data and dict_index

    - Removed chunk.dict_data_idx needed for dict sorting
    - Removed unused chunk.dict_data_size (redundant with num_dict_entries)
    - Removed unnecessary H->D and D->H copies and stream syncs
    devavret committed Jun 8, 2021
    Configuration menu
    Copy the full SHA
    026ed8c View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2021

  1. Don't launch dict kernels for 0 chunks

    Don't allocate dict_(index/data) where ck.use_dictionary is false
    devavret committed Jun 9, 2021
    Configuration menu
    Copy the full SHA
    23d4346 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d8a701f View commit details
    Browse the repository at this point in the history
  3. Misc cleanups

    devavret committed Jun 9, 2021
    Configuration menu
    Copy the full SHA
    7fbd26b View commit details
    Browse the repository at this point in the history
  4. dict code cleanups

    devavret committed Jun 9, 2021
    Configuration menu
    Copy the full SHA
    0d2cb6f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a62f7f3 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2e871ba View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2021

  1. Documentation

    devavret committed Jun 10, 2021
    Configuration menu
    Copy the full SHA
    cce3b8b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    67997b5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f4afda4 View commit details
    Browse the repository at this point in the history
  4. Testing CI for deadlock

    devavret committed Jun 10, 2021
    Configuration menu
    Copy the full SHA
    9e8d666 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2021

  1. Configuration menu
    Copy the full SHA
    797ba36 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ff8b885 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2021

  1. Configuration menu
    Copy the full SHA
    bb07ded View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2021

  1. Add missing syncthreads

    devavret committed Aug 9, 2021
    Configuration menu
    Copy the full SHA
    15d15b4 View commit details
    Browse the repository at this point in the history
  2. Fix for rapidsai#8890

    devavret committed Aug 9, 2021
    Configuration menu
    Copy the full SHA
    b401a5f View commit details
    Browse the repository at this point in the history
  3. Review cleanups

    devavret committed Aug 9, 2021
    Configuration menu
    Copy the full SHA
    533ccab View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2021

  1. Configuration menu
    Copy the full SHA
    cd375a0 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2021

  1. Cmake review changes

    devavret committed Aug 11, 2021
    Configuration menu
    Copy the full SHA
    c717e72 View commit details
    Browse the repository at this point in the history
  2. More cmake review fixes

    devavret committed Aug 11, 2021
    Configuration menu
    Copy the full SHA
    520cb84 View commit details
    Browse the repository at this point in the history
  3. More cmake fix

    devavret committed Aug 11, 2021
    Configuration menu
    Copy the full SHA
    8b74b96 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2021

  1. no more camelCase

    devavret committed Aug 12, 2021
    Configuration menu
    Copy the full SHA
    487ffd3 View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2021

  1. Review fixes

    devavret committed Aug 17, 2021
    Configuration menu
    Copy the full SHA
    09d02b5 View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2021

  1. MAX_DICT_SIZE was off by one

    Causing a problem where early return would result in num_dict_entries == 65536 but that meant that the dict bit calc logic would think it's ok to use dict as 65536 fits in 16 bits
    devavret committed Aug 18, 2021
    Configuration menu
    Copy the full SHA
    1f22996 View commit details
    Browse the repository at this point in the history
  2. Update cpp/src/io/parquet/chunk_dict.cu

    Co-authored-by: Vukasin Milovanovic <[email protected]>
    devavret and vuule authored Aug 18, 2021
    Configuration menu
    Copy the full SHA
    81be63f View commit details
    Browse the repository at this point in the history