Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X-Ray Remote Sampler does not poll for new rules #133

Closed
willarmiros opened this issue Nov 22, 2021 · 3 comments · Fixed by #135
Closed

X-Ray Remote Sampler does not poll for new rules #133

willarmiros opened this issue Nov 22, 2021 · 3 comments · Fixed by #135
Labels
type: bug Something isn't working

Comments

@willarmiros
Copy link
Contributor

Description
Currently, the X-Ray sampler only polls the GetSamplingRules once on startup:

This is problematic because if a customer creates a new sampling rule, it will never be picked up by the application until it is redeployed.

Steps to reproduce

  1. Create a sample app using the X-Ray Remote Sampler
  2. Run the application
  3. Create a new sampling rule on the AWS Console
  4. Observe that the new sampling rule is never picked up by the app

Expectation
Instead, the remote sampler should poll the GetSamplingRules API periodically, every 5 minutes.

cc @anuraaga

@willarmiros willarmiros added the type: bug Something isn't working label Nov 22, 2021
@anuraaga
Copy link
Contributor

The relevant code is here which does reschedule.

https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/AwsXrayRemoteSampler.java#L147

I know @wangzlei saw some behavior that indicated this may not be happening but just wondering if you were able to confirm the repro steps? During initial testing I could confirm dynamic update of the rules and code hasn't changed since so wondering if it's a corner case

@willarmiros
Copy link
Contributor Author

I have not confirmed the repro steps, and I wasn't looking at that code which does look good. Maybe @wangzlei can either provide a repro or enable FINE level logs to see if some weird exception is occurring?

@felixscheinost
Copy link
Contributor

I think this issue is still open. I set some breakpoints. The problem is that scheduleSamplerUpdate throws an exception in the following line

    long delay = pollingIntervalNanos + RANDOM.nextInt(jitterNanos);

The reason is that in my case jitterNanos = -1294967296, so negative, which isn't allowed.

Why is it negative? In my case polling interval nanos is the default of 300 s = 3 * 10^11 ns. 3 * 10^11 / 100 = 3 * 10^9 which is bigger than Integer.MAX_VALUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants