Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong kafka protocol detection #994

Merged
merged 3 commits into from
Jul 4, 2024

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Jul 3, 2024

There were a few bugs here, but the major one was that we didn't limit the return buffer of the TCP event to the return buffer length, so there was a mix of redis and kafka buffer, which somehow still parsed successfully as Kafka.

  1. Fixed the issue with the buffer length
  2. Added additional validation for Kafka events by checking the isolation level for validity and the payload max size.
  3. gRPC events can sometimes look like Kafka events, so we first check gRPC now because it's harder to mess up that protocol parsing.

@grcevski grcevski requested review from mariomac and marctc as code owners July 3, 2024 23:48
@@ -1,5 +1,5 @@
# Build the autoinstrumenter binary
FROM golang:1.22 as builder
FROM golang:1.22 AS builder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New docker version complains about using different cases for FROM and AS.

@@ -101,7 +101,7 @@ var MisclassifiedEvents = make(chan MisclassifiedEvent)

func ptlog() *slog.Logger { return slog.With("component", "ebpf.ProcessTracer") }

func ReadHTTPRequestTraceAsSpan(record *ringbuf.Record, filter ServiceFilter) (request.Span, bool, error) {
func ReadBPFTraceAsSpan(record *ringbuf.Record, filter ServiceFilter) (request.Span, bool, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this since it's no longer just HTTP :).

@@ -45,6 +45,7 @@ func (k Operation) String() string {
}

const KafkaMinLength = 14
const KafkaMaxPayload = 20 * 1024 * 1024 // 20 MB max, 1MB is default for most Kafka installations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for invalid packets that have the payload something ridiculous.

@@ -294,14 +307,21 @@ func getTopicOffsetFromFetchOperation(header *Header) int {
if header.APIVersion >= 3 {
offset += 4 // max_bytes
if header.APIVersion >= 4 {
if origOffset+offset >= len(pkt) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate that the isolation makes sense (0 or 1)

@@ -38,7 +38,7 @@ func isRedisOp(buf []uint8) bool {
return isRedisError(buf[1:])
case ':', '$', '*':
return crlfTerminatedMatch(buf[1:], func(c uint8) bool {
return (c >= '0' && c <= '9')
return (c >= '0' && c <= '9') || c == '-'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result could be '-1', so we need to allow '-' in the list of characters.

@@ -40,7 +40,7 @@ func ReadTCPRequestIntoSpan(record *ringbuf.Record, filter ServiceFilter) (reque
switch {
case validSQL(op, table):
return TCPToSQLToSpan(&event, op, table, sql), false, nil
case isRedis(b) && isRedis(event.Rbuf[:]):
case isRedis(b) && isRedis(event.Rbuf[:rl]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This missing bounds was the main bug.

@@ -40,7 +40,8 @@ func (k *Metadata) initServiceIPInformer(informerFactory informers.SharedInforme
return nil, fmt.Errorf("was expecting a Service. Got: %T", i)
}
if svc.Spec.ClusterIP == corev1.ClusterIPNone {
k.log.Warn("Service doesn't have any ClusterIP. Beyla won't decorate their flows",
// this will be normal for headless services
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to debug because the message keeps appearing for headless services.

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2024

Codecov Report

Attention: Patch coverage is 64.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 80.79%. Comparing base (84af0d1) to head (cfda0f3).
Report is 5 commits behind head on main.

Files Patch % Lines
pkg/internal/ebpf/common/kafka_detect_transform.go 60.00% 3 Missing and 3 partials ⚠️
pkg/internal/ebpf/common/tcp_detect_transform.go 66.66% 0 Missing and 2 partials ⚠️
pkg/internal/kube/informer_ip.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #994      +/-   ##
==========================================
+ Coverage   80.59%   80.79%   +0.19%     
==========================================
  Files         134      134              
  Lines       10688    10734      +46     
==========================================
+ Hits         8614     8672      +58     
+ Misses       1565     1554      -11     
+ Partials      509      508       -1     
Flag Coverage Δ
integration-test 55.90% <20.00%> (-0.06%) ⬇️
k8s-integration-test 59.44% <8.00%> (+0.18%) ⬆️
oats-test 36.34% <48.00%> (-0.01%) ⬇️
unittests 50.78% <44.00%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@marctc marctc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for those comments, makes the PR easy to review!

@grcevski grcevski merged commit 31e98da into grafana:main Jul 4, 2024
6 checks passed
@grcevski grcevski deleted the kafka_wrong_detection branch July 4, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants