Skip to content

Commit

Permalink
Refactor orc chunked writer (#12949)
Browse files Browse the repository at this point in the history
The current ORC chunked writer performs compressing/encoding and writing data into the output data sink without any safeguard. This PR modifies the internal `writer::impl::write()` function, separating it into multiple pieces:
 * A free function that performs compressing/encoding the input table into intermediate results. These intermediate results are totally independent of the writer. As such, the writer can be isolated from failures of this free function, allowing to retry upon failure.
 * After having the intermediate results in the previous step, these results will be actually applied to the output data sink to start the actual data writing.

Some cleanup is also performed on the existing code. That includes moving some member functions into free functions, which helps reducing potential dependencies between translation units.

There is no new implementation added in this work. Only the existing code is moved around.

Partially contributes to #12792.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #12949
  • Loading branch information
ttnghia authored Mar 21, 2023
1 parent 6547d96 commit 17a2cdc
Show file tree
Hide file tree
Showing 2 changed files with 736 additions and 552 deletions.
Loading

0 comments on commit 17a2cdc

Please sign in to comment.