Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag #1528

Merged

Conversation

EislM0203
Copy link
Contributor

@EislM0203 EislM0203 commented Apr 15, 2024

Changes

This PR adds a new feature flag kafkaQueueProblems to the opentelemetry demo. Upon activating the feature flag, the producer (checkoutservice) overloads Kafka by sending 100 extra messages to the queue per actual order. Simultaneously, the consumer (frauddetectionservice) delays the claiming of the messages by 1 second per message. This leads to a sudden spike in consumer lag. This is an interesting, real world observability scenario because it simulates queue problems in kafka which afaik no feature flag does yet. Metrics that monitor consumer lag can be viewed in Grafana (e.g. kafka_consumer_lag_avg).

Also increased the resource limitations of the frauddetection service since it kept dying due to resource exhaustion. This also happened without my modifications to the service.

Looking forward to your feedback!

Merge Requirements

For new features contributions please make sure you have completed the following
essential items:

  • CHANGELOG.md updated to document new feature additions
  • Appropriate documentation updates in the docs
  • Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

@github-actions github-actions bot added docs-update-required Requires documentation update helm-update-required Requires an update to the Helm chart when released labels Apr 15, 2024
@EislM0203 EislM0203 force-pushed the kafka-queue-problems-featureflag branch 2 times, most recently from 7e33420 to c379147 Compare April 15, 2024 10:26
Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike

The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)
also adjusted the resource limit for the frauddetection service since it kept dying
@EislM0203 EislM0203 force-pushed the kafka-queue-problems-featureflag branch from 38ec15e to cf5c1dc Compare April 15, 2024 13:18
@EislM0203 EislM0203 changed the title Add kafkaQueueProblems featureflag [kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag Apr 15, 2024
@EislM0203 EislM0203 marked this pull request as ready for review April 15, 2024 15:58
@EislM0203 EislM0203 requested a review from a team April 15, 2024 15:58
@puckpuck puckpuck merged commit e0500b2 into open-telemetry:main Apr 30, 2024
32 checks passed
@EislM0203 EislM0203 deleted the kafka-queue-problems-featureflag branch April 30, 2024 06:08
maxhakansson added a commit to maxhakansson/opentelemetry-demo that referenced this pull request May 10, 2024
* main: (138 commits)
  docs: update sig meeting schedule (open-telemetry#1567)
  chore(deps): upgrade otel collector contrib and opensearch (open-telemetry#1566)
  fix(loadgenerator): use add_hooks openfeature method (open-telemetry#1565)
  Revert "remove axoflow link (open-telemetry#1457)" (open-telemetry#1563)
  feat: configure feature flag tracing for Python services (open-telemetry#1553)
  chore(deps): upgrade go dependencies to latest versions (open-telemetry#1561)
  remove deprecated version property (open-telemetry#1557)
  chore(deps): upgrade otel collector contrib, grafana and prometheus (open-telemetry#1559)
  add imageprovider (open-telemetry#1552)
  [flagd] - upgrade to latest version and memory limits (open-telemetry#1554)
  update kubernetes manifest to 1.9.0 (open-telemetry#1555)
  [chore] specify default value for tracetest image version (open-telemetry#1551)
  improve baggage propagation (open-telemetry#1545)
  Bump gradle/wrapper-validation-action from 3.3.1 to 3.3.2 (open-telemetry#1548)
  [kafka][checkoutservice][frauddetectionservice] add kafkaQueueProblems featureflag (open-telemetry#1528)
  fix(productcatalogservice): handle err returned from openfeature.SetProvider func (open-telemetry#1535)
  feat(otelcol): add redisreceiver (open-telemetry#1537)
  chore(deps): upgrade opentelemetry-java-instrumentation for kafka to 2.3.0 (open-telemetry#1533)
  Bump gradle/wrapper-validation-action from 3.3.0 to 3.3.1 (open-telemetry#1539)
  chore(deps): upgrade opentelemetry-java-instrumentation to 2.3.0 (open-telemetry#1532)
  ...

# Conflicts:
#	docker-compose.minimal.yml
#	src/frontend/package-lock.json
AlexPSplunk pushed a commit to splunk/edu-opentelemetry-demo that referenced this pull request Jul 10, 2024
…s featureflag (open-telemetry#1528)

* Add kafkaQueueProblems featureflag

Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike

The result of that featureflag can be observed with numerous metrics in grafana (e.g. kafka_consumer_lag_avg)

* changed feature flag to int value for more configurability

also adjusted the resource limit for the frauddetection service since it kept dying

* addressed PR comments

* addressed PR comment

---------

Co-authored-by: Austin Parker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-update-required Requires documentation update helm-update-required Requires an update to the Helm chart when released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants