-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClickHouse - Materialized view contains duplicates on the grouping key #26301
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
The duplicates data is normal beacuse it is inserted when a batch of spans write to clickhouse. A traceID may exist in more than more batch. so we query with The mv is high effectly when spans count grow high especially more than billon level. #13442 (comment) If you query traceID is very fast without this time index mv in small dataset, you can just delete it. |
@hanjm thanks for clarifying. That means In order to get correct results, I'd GROUP again on the SELECT
TraceId,
min(Timestamp) as Start,
max(Timestamp) as End
FROM
otel_traces_trace_id_ts_mv
WHERE TraceId !=''
GROUP BY TraceId; |
@StarpTech Yes. as the readme.md example opentelemetry-collector-contrib/exporter/clickhouseexporter/README.md Lines 150 to 156 in 3ec9d22
|
@hanjm I mean when making use of the mv. |
@StarpTech No, this mv is just a insert trigger, you can select from the distination table |
@hanjm Thank you, that makes sense. |
I think this is not 100% correct. I know where my confusion came from. In Clickhouse, you can also use mv differently. Look at https://altinity.com/blog/clickhouse-materialized-views-illuminated-part-1 and query the mv directly because it creates a hidden table. |
The diffence is the |
Component(s)
exporter/clickhouse
What happened?
Description
Hello, we experience duplicates in the
otel_traces_trace_id_ts_mv
when ingesting spans from multiple services. This looks like a race condition or limitation of how the schema was designed. We work around this by querying theotel_traces_trace_id_ts
table directly.Reading the blog post https://clickhouse.com/blog/storing-traces-and-spans-open-telemetry-in-clickhouse from clickhouse the value of the materialized view is very limited. I recommend deleting it to avoid this. I also saw that the README examples don't use the materialized view at all.
Steps to Reproduce
Expected Result
No duplicates are in the
otel_traces_trace_id_ts_mv
table because the idea is to use this table as a root spans overview.Actual Result
Duplicates. See image above.
Collector version
0.81.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") PopOs 22.04 LTS
Compiler(if manually compiled): (e.g., "go 14.2") 1.2.0
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: