Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting transport-otlp-http errors #639

Open
miniengineer opened this issue Jul 3, 2024 · 18 comments
Open

Getting transport-otlp-http errors #639

miniengineer opened this issue Jul 3, 2024 · 18 comments
Labels
bug Report a bug

Comments

@miniengineer
Copy link

miniengineer commented Jul 3, 2024

Description

Hi 👋
We're using faro libraries to send logs and traces to our collectors.

We're seeing some errors in the console (check them below) and partialSuccess on network requests.

These are the packages we use:

"@grafana/faro-react": "^1.8.0",
"@grafana/faro-transport-otlp-http": "^1.8.0",
"@grafana/faro-web-sdk": "^1.8.0",
"@grafana/faro-web-tracing": "^1.8.0",

And this is our configuration

initializeFaro({
  app: {
    name: 'our-app/browser',
    version: '2023.test',
  },
  transports: [
    new OtlpHttpTransport({
      logsURL: 'collectorURL/logs',
      tracesURL: 'collectorURLtraces',
    }),
  ],

  instrumentations: [
    // Load the default Web instrumentations
    ...getWebInstrumentations(),

    // Tracing Instrumentation is needed if you want to use the React Profiler
    new TracingInstrumentation({
      instrumentationOptions: {
        propagateTraceHeaderCorsUrls: [/.*/],
      },
    }),

    new ReactIntegration({
      // Only needed if you want to use the React Router instrumentation
      router: createReactRouterV6Options({
        createRoutesFromChildren,
        matchRoutes,
        Routes,
        useLocation,
        useNavigationType,
      }),
    }),
  ],
});

These are the errors we see in the console

Screen Shot 2024-07-03 at 17 27 49

Steps to reproduce

  1. Add previously mentioned faro libraries to React project
  2. Use the configuration pasted above
  3. Open application in Chrome and try to send logs & traces
  4. Tada!

Expected behavior

All traces and metrics should be captured and response should be FullSuccess

Actual behavior

Some logs and metrics are captured but some are lost.
And we keep getting those errors in the console.

Environment

  • all packages are 1.8.0
  • @grafana/faro-react, @grafana/faro-transport-otlp-http, @grafana/faro-web-sdk, @grafana/faro-web-tracing;
  • Chrome

Demo

Context

@miniengineer miniengineer added the bug Report a bug label Jul 3, 2024
@janantharaj
Copy link

I'm seeing these issues as well, sporadically, with our service which uses Faro. We're on 1.7.2 and running in Chrome.

@prajon84
Copy link

cc: @codecapitano : Can you have a look into this issue? 🙏

@codecapitano
Copy link
Collaborator

Hi @miniengineer in this case the TypeError: Failed to fetch is very likely due to a CORS issue.

Would you mind inspecting one of those failed requests in the browser network panel to get some hints what's going on?
Would be interesting to see the request and response headers etc.

Are you using Faro with Grafana Cloud or with Alloy?

@miniengineer
Copy link
Author

hi @codecapitano

Thank you so much for the reply.

Would you mind inspecting one of those failed requests in the browser network panel to get some hints what's going on?
Would be interesting to see the request and response headers etc.

It's interesting that for each console error like the one I attached there's a one request that is stuck in the pending state.
I have waited an hour for it to be resolved to either error or success, but it doesn't change.

Since the status is pending there are only provisional request headers and no response headers...
I followed Chrome's documentation to enable at least request headers, but nothing helped.

the TypeError: Failed to fetch is very likely due to a CORS issue.

Hmm...
I am curious how can that be.
99% of the requests are successful and only handful result in an error 🤔
Is it possible?

Are you using Faro with Grafana Cloud or with Alloy?

We're using free community version

@codecapitano
Copy link
Collaborator

Hey @miniengineer

We're using free community version

So Faro as web-sdk and a OTEL collector.

I am curious how can that be.
99% of the requests are successful and only handful result in an error 🤔
Is it possible?

I don't have a satisfactory answer yet, but yes these thing can happen e.g because of network or server issue etc.
Can you check the the server logs / the collector logs if there are any hints?

Do you see any related warnings in the browser console?

@miniengineer
Copy link
Author

@codecapitano

Hi, sorry for the late reply and thank you for the follow-up 🙏

It took me a while, but I did test it locally.
I've set it up so that traces from the web application were sent to a locally running collector (used the opentelemetry-demo one via nginx reverse proxy to avoid CORS issues.

I used otel-tui viewer and was able to verify the traces 👍

Then as suggested I checked the collector logs hoping to find some information, but there was none :(
I've set log level to error, but still I didn't see any errors related to collector not being able to receive them.

Do you have any other thoughts what could be wrong? 🤔

@codecapitano
Copy link
Collaborator

Hi @miniengineer no worries and thanks a lot for your investigation.

The issue you outlined may be related to this issue.

Out of the box I have no good idea yet what's causing it.
So needs more investigation.
I hope that we're able to find the cause of this behavior quick.

@codecapitano
Copy link
Collaborator

@miniengineer unfortunatey I can't reproduce the issue, but found that the otlp-http-transport didn't consume the request body which an lead to several issues like hanging requests, memory leaks etc.

We'll release the update soon. Since I couldn't reproduce the issue it would be helpful if you can observer any positive impacts.

@miniengineer
Copy link
Author

Hi @codecapitano

Thank you 🙏 understood.
I am checking for updates every Friday, once you release I will test the latest version and see if that fixes issues we're facing.

@miniengineer
Copy link
Author

@codecapitano

Hi 👋
I was wondering when the new release is coming and whether you need any help with the fix?

I am open to contribute to the library 👀

@codecapitano
Copy link
Collaborator

Hi @miniengineer I'll release the updates otlp-http-transport today.
I initially planned to get more things into he release but gut pulled away by conflicting priorities.

In general contributions are always welcome. 🎉
Please don't hesitate to contribute.
We are always here to help in case you've questions etc.

@miniengineer
Copy link
Author

@codecapitano

Thank you for letting me know 🙏

I updated packages, but unfortunately this doesn't fix our issue 😢

I was testing on my local before, I will try to check the production collector logs and see maybe it will point to anything.
If I find a solution, I will definitely contribute.

For now it's an ongoing issue.

@codecapitano
Copy link
Collaborator

I updated packages, but unfortunately this doesn't fix our issue 😢

Oh no, had a slight hope that not closing he connection caused the issue.

The otel spec for partial-success states that the server must initialize the partial_success field filed with some extra information and should populate the error_message field with a human-readable error message.

  • Do you see any related fields in the response.
  • How does the payload of the failed request look? Maybe it is malformed?

Thank you for your support and continouus testing!
I'll also ask our otel team if they have further hints for us to debug the issue.

@miniengineer
Copy link
Author

@codecapitano

Thank you for following up.

Do you see any related fields in the response.

Response that we get for those requests is the following:

partialSuccess: {}

No error message, just empty object.

How does the payload of the failed request look? Maybe it is malformed?

Looks fine, TBH 🤔
Compared payload from pending and successful request, they both look the same.

Since I don't have capacity to focus on this issue right now, I will leave it as is and will come back to it once I have a bit more time.

For now I would like to keep this issue open 🙏

Thank you again.
If I find anything, I will post here 👍

@codecapitano
Copy link
Collaborator

codecapitano commented Sep 12, 2024

Heads up since you are using the otlp-http-transport, we are going to release a new Faro version with updated otel dependencies.

We aligned semantic attributes with the current otel spec.

  • The stable web-tracing package supports the old and new attribute names.
  • The experimental otlp-http-transport is directly switched to the new attribute names without supporting the old ones.

You can read about the changes in the CHANGELOG

@miniengineer
Copy link
Author

Awesome, thank you!!
I am updating libraries right now, let's see how it goes

@miniengineer
Copy link
Author

miniengineer commented Sep 30, 2024

Hi @codecapitano

IGNORE THIS

I probably need to open another issue, but getting errors related to imports after updating faro libraries to 1.10.1 👀

In package.json

    "@grafana/faro-react": "^1.10.1",
    "@grafana/faro-transport-otlp-http": "^1.10.1",
    "@grafana/faro-web-sdk": "^1.10.1",
    "@grafana/faro-web-tracing": "^1.10.1",

The errors related to imports. Seems that the libraries don't export them ⬇
Did I do something wrong? Maybe I need to update some dependencies?
I also removed all node modules + diff and run yarn again, but no change 🤔

Image
Image

@miniengineer
Copy link
Author

pls ignore the previous message 👍
purged yarn.lock + all node_modules folders and worked like a charm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report a bug
Projects
None yet
Development

No branches or pull requests

4 participants