Skip to content

Commit

Permalink
Merge branch 'main' into feature/otecol-data-flow-dashboard-PR797
Browse files Browse the repository at this point in the history
  • Loading branch information
devrimdemiroz authored Apr 1, 2023
2 parents 6762bf6 + 7f15014 commit deb7c3a
Show file tree
Hide file tree
Showing 9 changed files with 792 additions and 2 deletions.
287 changes: 287 additions & 0 deletions content/en/blog/2023/end-user-discussions-03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
---
title: OpenTelemetry End-User Discussions Summary for March 2023
linkTitle: End-User Discussions Mar 2023
date: 2023-03-30
author: '[Reese Lee](https://github.com/reese-lee) (New Relic)'
body_class: otel-with-contributions-from
---

With contributions from [Henrik Rexed](https://github.com/henrikrexed)
(Dynatrace), [Michael Hausenblas](https://github.com/mhausenblas) (AWS),
[Rynn Mancuso](https://github.com/musingvirtual) (Honeycomb),
[Adriana Villela](https://github.com/avillela) (Lightstep), and
[Pranay Prateek](https://github.com/pranay01) (SigNoz).

The OpenTelemetry end-user group meet takes place every month for users in the
Americas (AMER), Europe Middle-East & Africa (EMEA), and Asia-Pacific (APAC).

The discussions take place using a
[Lean Coffee format](https://agilecoffee.com/leancoffee/), whereby folks are
invited to post their topics to the
[Agile Coffee board like this one](http://agile.coffee/#3716060f-183a-4966-8da4-60daab2842c4),
and everyone in attendance votes on what they want to talk about.

## What we talked about

Sampling and collector capabilities continue to be topics of interest, along
with questions about instrumentation and adoption.

## Discussion Highlights

Below is the summary of this month's discussions.

### OpenTelemetry Collector

#### 1 - Losing gRPC with Azure App Services

**Q:** When looking at the hosting models in Azure for the OTel Collector, only
HTTP is supported (for running in Azure App Service). What are the risks
associated with losing gRPC capability?

**A:** If HTTP2 is supported in Azure, gRPC might work there, since gRPC is HTTP
under the hood with extra complications built on top of HTTP2. One suggestion is
to follow up with Microsoft about gRPC support, as it may have very long-running
connections.

#### 2 - Uptime monitoring/synthetics

**Q:** Does the OTel Collector have the capability to do uptime monitoring/
synthetics? If not, are there any plans to work towards such a thing?

**A:** The
[health check](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md)
might be a helpful reference. Also check out the
[http check receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md).

#### 3 - Collector distributions

**Q:** Should I use a vendor distribution versus the community collector
distribution?

**A:** Each vendor distribution will come with customizations, whereas the
community Collector distribution will include everything: receivers and
exporters. If you need the flexibility, then you should use the OTel Collector
distro.

#### 4 - Rate limiting on receivers

**Q:** Are there any plans for enabling rate limiting and circuit breaks on
receivers? Imagine having lots of clients sending telemetry to the same set of
OTel collectors.

**More context**: How do I rate-limit in a situation where I have collectors for
traces, metrics, and logs, and I’m receiving traffic from more than 100
individual apps? If I have even one customer who is generating heavy traffic, it
might impact the overall health of my collectors.

**A:** Use a reverse proxy. Something to note is that once the data is inside
the collector, the data is already being deserialized, and you’ve already
started firehosing the collector, so it’s a bit late to rate limit at that
point. One approach might be to add additional headers when you configure your
SDKs that contain the additional information, which would help with load
balancing.

#### 5 - Connectors

**Q:** What is a connector?

**A:** A connector is a collector component that consumes telemetry signals as
an exporter in one pipeline, and emits it as a receiver in another pipeline.
[Read more here](https://o11y.news/2023-03-13/#opentelemetry-connectors).

#### 6 - Definitions of upstream, downstream, and distro

**Q:** What is upstream? Downstream? Distro?

**A:** The terms "upstream" and "downstream" refer to how services or components
in a system are connected to each other. Check out
[this article](https://reflectoring.io/upstream-downstream/) for more
information as it applies to different situations in software.

The term "distro" is short for distribution. For a list of vendors that provide
distros, see [Vendors](/ecosystem/vendors/).

### Sampling

#### 1 - Tail sampling

**Q:** What are the perceived downsides of tail sampling, for example, on all
HTTP requests that have errors or long latencies, instead of just relying on
head-based sampling? Are there best practices around trace sampling? Tail
sampling can get very expensive.

**A:** Generally, head sampling is not recommended, as you aren’t going to be
able to do 100% of what you want to do with it, but it is true that tail
sampling is expensive. The reason why sampling is such a complicated discussion
is that there really isn’t a universal answer; furthermore, it also depends on
what kind of features are offered by your data analysis tool. For example, do
you have a data ingest or storage cost? If you have ingest cost, you’ll want to
sample before the data gets ingested; if it’s storage cost, you’ll have to
delete a lot of the data, so it depends on the tradeoffs.

One thing to consider is that you can use tail sampling on attributes, such as
if there’s an error on a span, but it does require more memory. Suggested
further exploration:

- [Column data store for OpenTelemetry](https://github.com/open-telemetry/oteps/pull/171)
- [OpAMP](/blog/2022/opamp/)
- Your backend vendor’s tail-based sampling strategies
- [Paper by Uber](https://www.uber.com/en-IN/blog/crisp-critical-path-analysis-for-microservice-architectures/)
- [Tail sampling processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md)

### Adoption, Migration, and Implementation

#### 1 - Common migration challenges

**Q:** What are common challenges faced by developers when migrating to
OpenTelemetry?

**More context:** We have hundreds of microservices that need to be migrated,
including big monolith systems with a lot of custom tracing locked into specific
vendors and their libraries. Setting up agents to facilitate this migration is
like having two different sets of observability systems running at the same
time.

**A:** One user shared their journey: They started by using a backend that
supports OpenTelemetry. The two challenges they faced were: a cultural change in
the engineer’s mindset, and raising awareness of OpenTelemetry, which are bigger
than the technical challenges. The key is to not propose one big change; the
journey of moving from a vendor-based solution to OpenTelemetry should be a
step-by-step process, rather than going into a full transformation.

Additional suggestions:

- Start with dev or testing environments first to build trust in the software
- Choose a stack where OTel is more robust, such as Java and Node.js
- For countering developer resistance, using auto-instrumentation modules to
- start with is a good step

#### 2 - Starting and scaling

**Q:** What is a good place to start from with OpenTelemetry? For example, from
infra to data collection, or starting in the application? And how do you scale
it up?

**More context**: Our use case is end-to-end visibility; currently, we are using
a vendor for monitoring logs, metrics, and traces. We are also using things like
RUM (real user monitoring). Can we do the same with OpenTelemetry, and at scale?

**A:** It depends on if you are starting to use OTel in a new project, or trying
to re-orchestrate an existing or old project. It’s best to start with a
transition plan, make sure the performance impact is not bad, and scale up what
you need. One suggestion is to start experimenting with Java OTel
instrumentation, as the overall performance impact is negligible.

Another suggestion is to try infrastructure monitoring with OpenTelemetry using
the
[host metrics receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md)
in the Collector, as it covers a lot of metrics, and has no dependencies. One
user noticed a 20% reduction in CPU usage when they moved from a vendor-specific
agent to the host metrics receiver for infrastructure monitoring.

#### 3 - Auto-instrumentation

**Q:** Is there a way to automatically create spans without code changes?

**A:** It depends on the use cases:

- [Auto instrumentation](/docs/concepts/instrumenting/#automatic-instrumentation)
options are maturing in OTel; for example, the Java JAR agent takes care of
instrumenting
[most libraries](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md#libraries--frameworks)
that are used by applications. Auto-instrumentation is also available for
[Python](/docs/instrumentation/python/automatic/),
[.NET](/docs/instrumentation/net/automatic/), and
[Node.js](/docs/instrumentation/js/libraries/#node-autoinstrumentation-package).
- If you’re using Kubernetes, they can use the
[OTel operator](https://github.com/open-telemetry/opentelemetry-operator),
which takes care of instrumentations for applications deployed on k8s. The
OTel Operator also supports injecting and configuring auto-instrumentation
libraries where available (see point above).
- If you’re using AWS lambda, you should check out the
[OTel Lambda extension](https://github.com/open-telemetry/opentelemetry-lambda).

#### 4 - Leveraging telemetry from OTel

**Q:** Has there been work toward telecommand standards to leverage the
telemetry from OTel?

**A:** Telecommand is a command sent to control a remote system or systems that
are not directly connected to the place from which the telecommand is sent (per
Wikipedia). Check out
[this paper](https://www.gsse.biz/pdfs/papers/DASIA2018-abstract.pdf), and
[OpAMP](/blog/2022/opamp/).

#### 5 - Message brokers

**Q:** What are some use cases for message brokers?

**A:** IoT use cases (car manufacturer). There is also ongoing work for semantic
conventions support for messages.

### Updates and Communications

#### 1 - Unified query standard

**Q:** Is there an update on the upcoming Unified Query Standard working group
for observability data and discussion at O11y Day at KubeCon EU?

**A:** The Observability TAG within CNCF is working to launch a working group
that is going to analyze the various query languages that are out there and come
up with use cases, such as, what are your most common alert and diagnostic
types, and what are some uncommon patterns that you’d like to have available?
Then, we’d like to see if there’s any way we can come up with a recommendation
for a unified standard language across vendors. Maybe SQL-ish?

We’re officially launching the working group at the end of the month; the
charter is open for comments.
[View here](https://docs.google.com/document/d/1JRQ4hoLtvWl6NqBu_RN8T7tFaFY5jkzdzsB9H-V370A).
We are going to start making the conference circuit and gather feedback, the
first place will be at
[Observability Day](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/co-located-events/observability-day/).
Join the discussion at
[#telemetry-analysis](https://cloud-native.slack.com/archives/C04LXHPDW6M) in
CNCF’s Slack instance.

#### 2 - Documentation and searches

**Q:** Where do you go to find documentation and answers to your questions?

**A:** We have many resources, including official documentation and Github
repos.

To help us improve our resources, it would be helpful to gather feedback from
you as an end user – what is your process for finding OTel information? Do you
search for answers or post questions on Stack Overflow? The community is
researching options that make sense so that questions can be indexed for
searching. One option is Stack Overflow. Please share your answers using one of
the avenues below!

## Meeting Notes & Recordings

For a deeper dive into the above topics, check out the following:

- [AMER](https://docs.google.com/document/d/1p_FoGbLiDC9VPqqLblJqQtHBn3tr-aPxhu2GaIykU6k)
meeting notes
- [EMEA](https://docs.google.com/document/d/1fh4RWyZ-ScWdwrgpRHO9mnfqLSKfxUTf4wZGdUvnnUM)
meeting notes
- [APAC](https://docs.google.com/document/d/1eDYC97LfvE428cpIf3A_hSGirdNzglPurlxgKCmw8o4)
meeting notes

## Join us!

If you have a story to share about how you use OpenTelemetry at your
organization, we’d love to hear from you! Ways to share:

- Join the [#otel-endusers channel](/community/end-user/slack-channel/) on the
[CNCF Community Slack](https://communityinviter.com/apps/cloud-native/cncf)
- Join our monthly
[End-User Discussion Group calls](/community/end-user/discussion-group/)
- Join our [OTel in Practice](/community/end-user/otel-in-practice/) sessions
- Share your stories on the
[OpenTelemetry blog](https://github.com/open-telemetry/opentelemetry.io/blob/954103a7444d691db3967121f0f1cb194af1dccb/README.md#submitting-a-blog-post)

Be sure to follow OpenTelemetry on
[Mastodon](https://fosstodon.org/@opentelemetry) and
[Twitter](https://twitter.com/opentelemetry), and share your stories using the
**#OpenTelemetry** hashtag!
Loading

0 comments on commit deb7c3a

Please sign in to comment.