You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to utilize common hash table machinery if possible. We should of course make sure that such a change does not cause performance regressions (performance improved due to Antoine's patch, so perf may also get better on the Parquet write path)
I was mainly interested if the inner working of the hash table caused any overhead in this case. I would guess string dictionary writes are a bit faster now with the better hashing path.
@pitrou has recently made some significant improvements to hashing / dictionary encoding machinery in Apache Arrow
eaf8d32
parquet-cpp is using a custom hash table
https://github.com/apache/arrow/blob/master/cpp/src/parquet/encoding-internal.h#L456
It would be nice to utilize common hash table machinery if possible. We should of course make sure that such a change does not cause performance regressions (performance improved due to Antoine's patch, so perf may also get better on the Parquet write path)
Reporter: Wes McKinney / @wesm
Assignee: Antoine Pitrou / @pitrou
PRs and other links:
Note: This issue was originally created as PARQUET-1463. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: