Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Cassandra Exporter #17910

Closed
2 tasks
emreyalvac opened this issue Jan 23, 2023 · 11 comments
Closed
2 tasks

New component: Cassandra Exporter #17910

emreyalvac opened this issue Jan 23, 2023 · 11 comments
Labels
Accepted Component New component has been sponsored Stale

Comments

@emreyalvac
Copy link
Member

emreyalvac commented Jan 23, 2023

The purpose and use-cases of the new component

The purpose of this exporter is to extract traces and logs to Cassandra database.

I already started to develop this component here: https://github.com/emreyalvac/opentelemetry-collector-contrib/tree/cassandra-exporter-implementation.

#18515

Example configuration for the component

exporters:
  cassandra:
    dsn: 127.0.0.1
    keyspace: "otel"
    trace_table: "otel_spans"
    logs_table: "otel_logs"

service:
  pipelines:
    traces:
      exporters: [ cassandra ]
    logs:
      receivers: [ otlp ]
      exporters: [ cassandra ]

Telemetry data types supported

traces, logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

@emreyalvac emreyalvac added the needs triage New item requiring triage label Jan 23, 2023
@atoulme atoulme added Sponsor Needed New component seeking sponsor and removed needs triage New item requiring triage labels Jan 23, 2023
@atoulme
Copy link
Contributor

atoulme commented Jan 23, 2023

Can you explain a bit more the use case? Is there a standard data format in which the traces will be stored?

@emreyalvac
Copy link
Member Author

emreyalvac commented Jan 23, 2023

Hi @atoulme,

My thought is that write speed is very important for Open Telemetry.

Cassandra is defined as an open-source NoSQL data storage system that leverages a distributed architecture to enable high availability, scalability, and reliability, managed by the Apache non-profit organization.

Cassandra, so fast for write operations and very compatible for analytics data. Also, it's support storing time series data thats why you can calculate throughput, response time and apdex etc.. (time series)

Cassandra’s three data modeling ‘dogmas’:

Disk space is cheap.
Writes are cheap.
Network communication is expensive.

Example Span data on Cassandra database:

[
  {
    "traceid": "104077629213055e8523102a57c659cd",
    "duration": 75957000,
    "events": null,
    "links": null,
    "parentspanid": "",
    "resourceattributes": {
      "service.name": "unknown_service:dotnet"
    },
    "servicename": "unknown_service:dotnet",
    "spanattributes": {
      "http.flavor": "1.1",
      "http.host": "localhost:5000",
      "http.method": "GET",
      "http.scheme": "http",
      "http.status_code": "200",
      "http.target": "/swagger/v1/swagger.json",
      "http.url": "http://localhost:5000/swagger/v1/swagger.json",
      "http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
    },
    "spanid": "c123d6dae1744ce3",
    "spankind": "SPAN_KIND_SERVER",
    "spanname": "/swagger/v1/swagger.json",
    "statuscode": "STATUS_CODE_UNSET",
    "statusmessage": "",
    "timestamp": "2023-01-22",
    "tracestate": ""
  }
]

@atoulme
Copy link
Contributor

atoulme commented Jan 23, 2023

So is it stored as a cql table? What is the schema used?

@atoulme
Copy link
Contributor

atoulme commented Jan 26, 2023

I have found those in your impl:


const (
	// language=SQL
	createDatabaseSQL = `CREATE KEYSPACE %s with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };`
	// language=SQL
	createEventTypeSql = `CREATE TYPE IF NOT EXISTS %s.Events (Timestamp Date, Name text, Attributes map<text, text>);`
	// language=SQL
	createLinksTypeSql = `CREATE TYPE IF NOT EXISTS %s.Links (TraceId text, SpanId text, TraceState text, Attributes map<text, text>);`
	// language=SQL
	createSpanTableSQL = `CREATE TABLE IF NOT EXISTS %s.%s (TimeStamp DATE,TraceId text, SpanId text, ParentSpanId text, TraceState text, SpanName text, SpanKind text, ServiceName text, ResourceAttributes map<text, text>, SpanAttributes map<text, text>, Duration int,StatusCode text,StatusMessage text, Events frozen<Events>, Links frozen<Links>, PRIMARY KEY (TraceId));`
)

That is intriguing. I'd like to see if you have considered looking into how to work on this with a cluster (I see replication factor set to 1) and particularly if you have a partition key strategy for this.

@emreyalvac
Copy link
Member Author

emreyalvac commented Jan 28, 2023

Hi @atoulme,

Thanks for your time and review. I appreciate it.

Yes, it's storing in Cassandra tables. I improved config structure to change replication and compression dynamically. Also i changed PRIMARY KEY to SpanId. (PRIMARY KEY also defines the PARTITION KEY) Maybe we can create COMPOSE PARTITION KEY between ServiceName and SpanId.

Compression Types

https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html

Replication:

CREATE KEYSPACE otel WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 3};

image

In the above example, we created a keyspace called otel using SimpleStrategy with replication factor 3. The data inserted in this keyspace will be replicated to the three nodes, in one datacenter and across different racks.

When i run Cassandra exporter with following config, schema will be like this:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
exporters:
  cassandra:
    dsn: 127.0.0.1
    keyspace: "otel"
    trace_table: "otel_spans"
    replication:
      class: "SimpleStrategy"
      replication_factor: 1
    compression:
      algorithm: "ZstdCompressor"

service:
  pipelines:
    traces:
      receivers: [ otlp ]
      exporters: [ cassandra ]

Schema:

otel: schema durable_writes: true replication: {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
    + object-types
        events: object-type
            + object-attributes
                timestamp: date
                name: text
                attributes: map<text, text>
        links: object-type
            + object-attributes
                traceid: text
                spanid: text
                tracestate: text
                attributes: map<text, text>
    + tables
        otel_spans: table compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.ZstdCompressor'}
            + columns
                traceid: text
                duration: int
                events: frozen<events>
                links: frozen<links>
                parentspanid: text
                resourceattributes: map<text, text>
                servicename: text
                spanattributes: map<text, text>
                spanid: text
                spankind: text
                spanname: text
                statuscode: text
                statusmessage: text
                timestamp: date
                tracestate: text
            + keys
                primary key: (spanid)

Default config:

{
		DSN:        "127.0.0.1",
		Keyspace:   "otel",
		TraceTable: "otel_spans",
		Replication: Replication{
			Class:             "SimpleStrategy",
			ReplicationFactor: 1,
		},
		Compression: Compression{
			Algorithm: "LZ4Compressor",
		},
	}

@atoulme
Copy link
Contributor

atoulme commented Jan 28, 2023

That’s great! Please look for a sponsor to land this. I cannot sponsor fwiw. Come to a SIG meeting if possible to present your work.

@mx-psi
Copy link
Member

mx-psi commented Feb 8, 2023

@atoulme now that you can, would you be interested in sponsoring this component?

@emreyalvac
Copy link
Member Author

Now also supports Logs.

@atoulme
Copy link
Contributor

atoulme commented Feb 8, 2023

I will sponsor.

@atoulme atoulme added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Feb 8, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Apr 14, 2023
@emreyalvac
Copy link
Member Author

Done. #18515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored Stale
Projects
None yet
Development

No branches or pull requests

3 participants