Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POST requests with identify events 502 starting on or around v6.7.8 #193

Closed
joshuaeilers opened this issue Jun 22, 2022 · 9 comments
Closed

Comments

@joshuaeilers
Copy link
Contributor

Is this a support request?
This issue tracker is maintained by LaunchDarkly SDK developers and is intended for feedback on the SDK code. If you're not sure whether the problem you are having is specifically related to the SDK, or to the LaunchDarkly service overall, it may be more appropriate to contact the LaunchDarkly support team; they can help to investigate the problem and will consult the SDK team if necessary. You can submit a support request by going here or by emailing [email protected].

Note that issues filed on this issue tracker are publicly accessible. Do not provide any private account information on your issues. If your problem is specific to your account, you should submit a support request as described above.

Describe the bug
A clear and concise description of what the bug is.

We noticed that on or around v6.7.7 or v6.7.8 the POST requests began failing for us. The v6.7.6 appears to work just fine.

To reproduce
Steps to reproduce the behavior.

Install v6.7.8 and watch the POST requests 502.

Expected behavior
Should not 502. Maybe it's from the eventsource bump in v6.7.7?

Logs
If applicable, add any log output related to your problem.

SDK version
The version of this SDK that you are using.

^2.22.1

Language version, developer tools
For instance, Go 1.11 or Ruby 2.5.3. If you are using a language that requires a separate compiler, such as C, please include the name and version of the compiler too.

OS/platform
For instance, Ubuntu 16.04, Windows 10, or Android 4.0.3. If your code is running in a browser, please also include the browser type and version.

Additional context
Add any other context about the problem here.

@joshuaeilers
Copy link
Contributor Author

This might an environment specific issue for us actually. Still investigating.

@eli-darkly
Copy link
Contributor

@joshuaeilers I can still try to reproduce this in the meantime, but could you clarify about the steps to reproduce: are you talking about POST requests to the Relay Proxy's event endpoints (from SDKs) getting a 502 error, or are you talking about the Relay Proxy making POST requests to deliver the event data to LaunchDarkly and getting a 502 error?

@joshuaeilers
Copy link
Contributor Author

joshuaeilers commented Jun 22, 2022

From sdk to the relay proxy and get a 502 error from our load balancer (at least I suspect the LB but Have not proven this theory yet). Only in prod, our non-prod environments are fine. Get requests on prod also work fine. I don't think we're getting any response back from the LD server itself, but that is just a hunch based on not seeing anything indicating such in the logs.

@eli-darkly
Copy link
Contributor

Hmm, well, it sounds to me like a LB issue is the most likely explanation. I did try to reproduce this using v6.7.8 of the Relay Proxy, and couldn't. I used a minimal config file—

[Main]
logLevel=debug

[Events]
sendEvents=true

[Environment "eli"]
sdkKey=<my test SDK key>

—and then sent some valid event data to it. I got a 202 Accepted status on that request, as expected, and then a second later the Relay Proxy sent the same event data to LD and again got a 202 Accepted.

Also, looking at the code, I don't see any condition in the Relay Proxy logic that would cause it to return a 502. It can return a 503 if event proxying is not turned on, but we don't use 502 for anything. And the code changes in v6.7.6 and v6.7.7 are not related to analytics event posts at all; the name "eventsource" refers to a different thing.

So if there's any possibility that your load balancer configuration may have changed, I think that is a more likely thing to look at. If the load balancer is returning a 502 then it is presumably not forwarding the post to the Relay Proxy at all. One way to verify that, if you can do this conveniently, would be to turn on debug logging in the Relay Proxy (temporarily, since it is very verbose) because when debug logging is enabled, Relay will log something like this every time it receives a post:

2022/06/22 12:42:25.537962 DEBUG: [env: ...0a91] Received 2 events (v3) to be proxied to /bulk
2022/06/22 12:42:25.538235 DEBUG: Request: method=POST url=/bulk auth=*a0a91 status=202 bytes=0

So if requests are happening and getting 502 errors, but Relay doesn't log anything like that, that would indicate that the requests are being blocked before they get to it.

@joshuaeilers
Copy link
Contributor Author

We just re-deployed and the issue resolved itself, any of these versions are working now. Maybe something got into a funk while it was left running for an extended period of time or something? Definitely doesn't reproduce for me now but will circle back if it resurfaces.

@joshuaeilers joshuaeilers closed this as not planned Won't fix, can't repro, duplicate, stale Jun 22, 2022
@joshuaeilers
Copy link
Contributor Author

joshuaeilers commented Jun 22, 2022

This is a graph of memory usage of relay proxy over time when we started noticing this problem. It all started around June 15/16th which you can see the mem begin sawtoothing right there.

image

@eli-darkly
Copy link
Contributor

eli-darkly commented Jun 22, 2022

Well, it's normal to see some transient memory spikes if you have a fairly large amount of flag data, because there is some buffering that happens whenever it receives that data from LaunchDarkly. I'm not sure what you mean by "begin sawtoothing" because I don't see any spikes on that graph after June 15, just a gradual smooth increase; whether that is normal or not would depend on whether the size of your data or the number of connected clients had increased or not.

But, again, I'm not sure how to account for a 502 being returned by the Relay Proxy under any circumstances; the code never explicitly sets such a response status, and it doesn't look to me as if we are using any kind of server framework that would ever set such a status for us.

@joshuaeilers
Copy link
Contributor Author

joshuaeilers commented Jun 22, 2022

I guess what I mean is June 16, the used mem began to climb, same time as when our our LD events dropped off the abyss. Timing is just very suspicious. When we re-rolled the relay proxy we didn't touch the LB, so the only variable here was restarting the proxy.

We should have very little flag data, like 4 flags (we're a new LD customer). Also doing streaming:false so not doing any persistent client connections, at least not intentionally. We'll keep an eye on this and see if it resurfaces again.

image

@eli-darkly
Copy link
Contributor

@joshuaeilers Hmm, yeah, that is odd. I'm still having trouble thinking of any condition that could result in a 502 from this service itself, rather than from some other layer in front of the service, but I'll keep thinking about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants