Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPC Codec for floating point data #37553

Merged
merged 9 commits into from
Jun 15, 2022
Merged

FPC Codec for floating point data #37553

merged 9 commits into from
Jun 15, 2022

Conversation

koloshmet
Copy link
Contributor

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implementation of FPC algorithm for floating point data compression

Requested here:
#25925

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@CLAassistant
Copy link

CLAassistant commented May 26, 2022

CLA assistant check
All committers have signed the CLA.

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-feature Pull request with new product feature label May 26, 2022
@vdimir vdimir added the can be tested Allows running workflows for external contributors label May 26, 2022
@rschu1ze rschu1ze self-assigned this Jun 3, 2022
@alexey-milovidov
Copy link
Member

@rschu1ze started reviewing and said that the implementation quality is very good.

Copy link
Member

@rschu1ze rschu1ze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thanks for this contribution! I will leave a first batch of comments and hope to continue the review tomorrow (especially the encoding/decoding part). It's mostly minor stuff overall and it should be possible to merge this PR soon-ish.

src/Compression/CompressionCodecFPC.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Outdated Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Outdated Show resolved Hide resolved
Copy link
Member

@rschu1ze rschu1ze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the rest of the change. Overall it looks really good (thanks). My main question at this point would be if the intermediate copy step can somehow be avoided? (see the detailed comments).

src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved

UInt32 CompressionCodecFPC::getMaxCompressedDataSize(UInt32 uncompressed_size) const
{
auto float_count = (uncompressed_size + float_width - 1) / float_width;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMaxCompressedDataSize() does a worst-case estimation of the compressed data size, right?

I don't really understand the calculation in l. 74, i.e. why are we adding float_width - 1? --> Add a comment?

Same for l. 77, why are we adding float_count / 2? A comment for clueless readers like me would be nice ^^

src/Compression/CompressionCodecFPC.cpp Show resolved Hide resolved
@rschu1ze
Copy link
Member

@koloshmet: For documentation, it would be cool if you could also add new items about FPC to [0] and [1].

[0] https://clickhouse.com/docs/en/sql-reference/statements/create/table/#specialized-codecs
[1] https://clickhouse.com/docs/en/faq/use-cases/time-series/

@rschu1ze
Copy link
Member

@koloshmet I'll merge this PR now as it is overall of very high quality (thanks again) and the remaining issues are fine to be fixed separately. I can to that as well.

Functional tests look good. Performance tests are more interesting:

  • We are getting exceptions clickhouse_driver.errors.ServerException: Code: 432. when ClickHouse tries to run the new queries in "codecs_float_select" on an older version. As the codec is not available in these versions, this is expected
  • In [0] and [1], decompression performance of the existing Float64 codecs is reported to deteriorate between 17% and 135%. Frankly speaking, I have no explanation for that because the new codec does not interfere with the old codecs and the test queries have only been extended, not modified. I think there is no problem, otherwise we would have seen massive regressions in other performance tests too.

[0] https://s3.amazonaws.com/clickhouse-test-reports/37553/092a00d95aa1317aa83e7f6e489d1ee00f64d8f5/performance_comparison_aarch64_[1/4]/report.html
[1] https://s3.amazonaws.com/clickhouse-test-reports/37553/092a00d95aa1317aa83e7f6e489d1ee00f64d8f5/performance_comparison_[1/4]/report.html

@rschu1ze rschu1ze merged commit 9794098 into ClickHouse:master Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants