Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: Fix infinite loop in maxPauseNs #927

Merged
merged 1 commit into from
May 21, 2021
Merged

profiler: Fix infinite loop in maxPauseNs #927

merged 1 commit into from
May 21, 2021

Conversation

felixge
Copy link
Member

@felixge felixge commented May 16, 2021

I noticed this issue when running the TestProfilerInternal test by
itself, which consistently produced the following error:

$ go test ./profiler -run TestProfilerInternal
--- FAIL: TestProfilerInternal (0.20s)
    --- FAIL: TestProfilerInternal/collect (0.20s)
        profiler_test.go:189: missing batch
FAIL
FAIL	gopkg.in/DataDog/dd-trace-go.v1/profiler	0.395s

The reason for this timeout is that the loop inside of maxPauseNs was
not hitting its termination conditions when stats.NumGC is 0. The reason
for this is subtle:

  • i is uint32, so it will always be <= 0
  • periodStart is "1729-02-04" when running the metrics profile for the first
    time. This is caused by another issue which will be addressed in a
    follow-up patch.

As far as I can tell this issue should not occur in the real world
because the periodStart will not be 1729 after profiler.run resets the
metrics collectedAt timestamp.

I noticed this issue when running the TestProfilerInternal test by
itself, which consistently produced the following error:

```
$ go test ./profiler -run TestProfilerInternal
--- FAIL: TestProfilerInternal (0.20s)
    --- FAIL: TestProfilerInternal/collect (0.20s)
        profiler_test.go:189: missing batch
FAIL
FAIL	gopkg.in/DataDog/dd-trace-go.v1/profiler	0.395s
```

The reason for this timeout is that the loop inside of maxPauseNs was
not hitting its termination conditions when stats.NumGC is 0. The reason
for this is subtle:

- i is uint32, so it will always be <= 0
- periodStart is "1729-02-04" when running the metrics profile for the first
  time. This is caused by another issue which will be addressed in a
  follow-up patch.

As far as I can tell this issue should not occur in the real world
because the periodStart will not be 1729 after `profiler.run` resets the
metrics collectedAt timestamp.
@felixge felixge requested a review from pmbauer May 16, 2021 13:23
@felixge
Copy link
Member Author

felixge commented May 16, 2021

@pmbauer I noticed this while trying to play around with the GoLand IDE for non-work related reasons. I don't think this issue is hitting in the real world, but I think we shouldn't have any kind of loops that rely on precarious invariants for termination.

@felixge felixge added this to the 1.31.0 milestone May 16, 2021
Copy link
Member

@pmbauer pmbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay; as you pointed out, not a real world concern

@pmbauer
Copy link
Member

pmbauer commented May 18, 2021

also, not a bad idea to have a failing test

@felixge
Copy link
Member Author

felixge commented May 18, 2021

also, not a bad idea to have a failing test

You mean adding a new test? Yeah, I guess that's doable. Will try to add one and ping you.

@pmbauer
Copy link
Member

pmbauer commented May 18, 2021 via email

@felixge
Copy link
Member Author

felixge commented May 19, 2021

I meant failing test. The code without this change passes CI okay. The test
failure in the pull description requires specific conditions.

Ok, now I'm confused : p. You want me to have a failing test, but not add a new test? So are you suggesting to change the CI YAML to add go test ./profiler -run TestProfilerInternal?

Sorry if I'm being dense here 🙈

@pmbauer
Copy link
Member

pmbauer commented May 19, 2021 via email

@felixge felixge merged commit cc66f25 into v1 May 21, 2021
@felixge felixge deleted the fix-metrics-issue branch May 21, 2021 09:09
@felixge
Copy link
Member Author

felixge commented May 21, 2021

@pmbauer got it. I'll do this in the follow-up PR that will try to fix the "problem" that periodStart is "1729-02-04" on the first run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants