FPC Codec for floating point data #37553

koloshmet · 2022-05-26T04:38:20Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implementation of FPC algorithm for floating point data compression

Requested here:
#25925

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CLAassistant · 2022-05-26T04:38:27Z

All committers have signed the CLA.

alexey-milovidov · 2022-06-05T11:56:04Z

@rschu1ze started reviewing and said that the implementation quality is very good.

rschu1ze

Hi. Thanks for this contribution! I will leave a first batch of comments and hope to continue the review tomorrow (especially the encoding/decoding part). It's mostly minor stuff overall and it should be possible to merge this PR soon-ish.

src/Compression/CompressionCodecFPC.cpp

rschu1ze

I reviewed the rest of the change. Overall it looks really good (thanks). My main question at this point would be if the intermediate copy step can somehow be avoided? (see the detailed comments).

src/Compression/CompressionCodecFPC.cpp

rschu1ze · 2022-06-08T11:14:03Z

src/Compression/CompressionCodecFPC.cpp

+
+UInt32 CompressionCodecFPC::getMaxCompressedDataSize(UInt32 uncompressed_size) const
+{
+    auto float_count = (uncompressed_size + float_width - 1) / float_width;


getMaxCompressedDataSize() does a worst-case estimation of the compressed data size, right?

I don't really understand the calculation in l. 74, i.e. why are we adding float_width - 1? --> Add a comment?

Same for l. 77, why are we adding float_count / 2? A comment for clueless readers like me would be nice ^^

src/Compression/CompressionCodecFPC.cpp

rschu1ze · 2022-06-14T11:38:19Z

@koloshmet: For documentation, it would be cool if you could also add new items about FPC to [0] and [1].

[0] https://clickhouse.com/docs/en/sql-reference/statements/create/table/#specialized-codecs
[1] https://clickhouse.com/docs/en/faq/use-cases/time-series/

rschu1ze · 2022-06-15T10:03:36Z

@koloshmet I'll merge this PR now as it is overall of very high quality (thanks again) and the remaining issues are fine to be fixed separately. I can to that as well.

Functional tests look good. Performance tests are more interesting:

We are getting exceptions clickhouse_driver.errors.ServerException: Code: 432. when ClickHouse tries to run the new queries in "codecs_float_select" on an older version. As the codec is not available in these versions, this is expected
In [0] and [1], decompression performance of the existing Float64 codecs is reported to deteriorate between 17% and 135%. Frankly speaking, I have no explanation for that because the new codec does not interfere with the old codecs and the test queries have only been extended, not modified. I think there is no problem, otherwise we would have seen massive regressions in other performance tests too.

[0] https://s3.amazonaws.com/clickhouse-test-reports/37553/092a00d95aa1317aa83e7f6e489d1ee00f64d8f5/performance_comparison_aarch64_[1/4]/report.html
[1] https://s3.amazonaws.com/clickhouse-test-reports/37553/092a00d95aa1317aa83e7f6e489d1ee00f64d8f5/performance_comparison_[1/4]/report.html

koloshmet added 4 commits May 25, 2022 22:04

FPC codec

0697b9f

fixed max size computation

adf8888

fixed decoding parameters

cd3e28e

fixed decoding parameters

1bfeb98

robot-ch-test-poll1 added the pr-feature Pull request with new product feature label May 26, 2022

vdimir added the can be tested Allows running workflows for external contributors label May 26, 2022

koloshmet added 3 commits May 26, 2022 10:58

added test query

4d41121

code style fixes

821100f

added fpc codec to float perftest

7e69779

rschu1ze self-assigned this Jun 3, 2022

rschu1ze reviewed Jun 7, 2022

View reviewed changes

so many improvements

d5064dd

rschu1ze reviewed Jun 8, 2022

View reviewed changes

Merge branch 'ClickHouse:master' into fpc_codec

092a00d

rschu1ze merged commit 9794098 into ClickHouse:master Jun 15, 2022

rschu1ze mentioned this pull request Jun 15, 2022

Small follow-up for FPC codec #38089

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FPC Codec for floating point data #37553

FPC Codec for floating point data #37553

koloshmet commented May 26, 2022

CLAassistant commented May 26, 2022 •

edited

Loading

alexey-milovidov commented Jun 5, 2022

rschu1ze left a comment

rschu1ze left a comment

rschu1ze Jun 8, 2022

rschu1ze commented Jun 14, 2022

rschu1ze commented Jun 15, 2022

FPC Codec for floating point data #37553

FPC Codec for floating point data #37553

Conversation

koloshmet commented May 26, 2022

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

CLAassistant commented May 26, 2022 • edited Loading

alexey-milovidov commented Jun 5, 2022

rschu1ze left a comment

Choose a reason for hiding this comment

rschu1ze left a comment

Choose a reason for hiding this comment

rschu1ze Jun 8, 2022

Choose a reason for hiding this comment

rschu1ze commented Jun 14, 2022

rschu1ze commented Jun 15, 2022

CLAassistant commented May 26, 2022 •

edited

Loading