Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse exporter produces duplicates and poor compression without sorting attributes #33634

Open
JustinMason opened this issue Jun 18, 2024 · 7 comments · May be fixed by #35725
Open

ClickHouse exporter produces duplicates and poor compression without sorting attributes #33634

JustinMason opened this issue Jun 18, 2024 · 7 comments · May be fixed by #35725
Assignees
Labels

Comments

@JustinMason
Copy link

Component(s)

exporter/clickhouse

Is your feature request related to a problem? Please describe.

The default table created by the exporter isn't a good pattern for optimizing compression and removing duplicates. ClickHouse does not sort the map values, so even though there may be duplicate records the order of their attributes may be different. This causes ClickHouse to treat them as unique records for storage and merge trees. This also effects ClickHouses compression so the same data takes up a lot more disk.

Describe the solution you'd like

We identified this issue and the solution was to use a NULL Engine for the primary table the Exporter writes to, then using a Materialized View we explicitly sort the attributes before insert.
mapSort(Attributes) as Attributes,

After this the compression rate for billions of rows was greater than 250, making the storage needed much less. It also eliminated duplicates and helped streamline the increase functions so we could avoid extra processing.

This makes the initial table creation a bit trickier but it is critical in my experience.

Describe alternatives you've considered

No response

Additional context

No response

@JustinMason JustinMason added enhancement New feature or request needs triage New item requiring triage labels Jun 18, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@SpencerTorres
Copy link
Member

Interesting information, we actually have a PR open to update the table (#33611)

Considering the complexity of the materialized view required, it might be best to do this in the exporter code. Maps are unpredictable in Go, so we would need to convert it to a slice and sort it. Any thoughts on this approach?

@hanjm
Copy link
Member

hanjm commented Jun 19, 2024

Yes, map sort can improve compression, clickhouse-go sdk support column.IterableOrderedMap ClickHouse/clickhouse-go#1152, exporter can use this sdk type to write clickhouse map type with order, welcome a PR to try it.

@crobert-1
Copy link
Member

Removing needs triage based on response from code owners.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Jun 20, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 9, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 9, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 9, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 9, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 10, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 10, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 14, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 16, 2024
earwin added a commit to rainsouthafrica/opentelemetry-collector-contrib that referenced this issue Oct 17, 2024
@hanjm
Copy link
Member

hanjm commented Oct 21, 2024

still valid

@github-actions github-actions bot removed the Stale label Oct 22, 2024
@SpencerTorres
Copy link
Member

This can be closed once #35725 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants