-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove temporal configuration file after a while #3507
Conversation
internal/ingress/controller/nginx.go
Outdated
|
||
// In case of error the temporal configuration file will | ||
// be available up to five minutes after the error | ||
go func(tmpFile string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you limit the number of max possible goroutines so that we don’t potentially DOS ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElvinEfendi actually, the goroutines will sleep 99% of the time, there is no gain in doing what you are proposing. Also, that approach will add unnecessary complexity (we need some queue aware of time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my worry:
Given a working ingress-nginx deployment in a cluster, someone makes a change that causes Nginx config test to fail. The controller in that case will log an error and spin up a goroutine to delete the temporary file. At the same time controller will retry generating new Nginx configuration. This will cause another goroutine to be span up. And eventually the controller will get OOMKilled (because we retry faster than 5 minutes the goroutines will accumulate). And this will break all the services.
we need some queue aware of time
why can we not just check creation time of the file? Instead of spinning up a new goroutine here we can collect the filenames in a queue and at the start of controller we can start a "garbage collector goroutine" that periodically pops from the queue and checks if the file was created longer than 5 minutes ago and if so delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can we not just check creation time of the file?
Good point. I will refactor the PR to check the timestamp, starting only one goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElvinEfendi ready for review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my worry:
Given a working ingress-nginx deployment in a cluster, someone makes a change that causes Nginx config test to fail. The controller in that case will log an error and spin up a goroutine to delete the temporary file. At the same time controller will retry generating new Nginx configuration. This will cause another goroutine to be span up. And eventually the controller will get OOMKilled (because we retry faster than 5 minutes the goroutines will accumulate). And this will break all the services.we need some queue aware of time
why can we not just check creation time of the file? Instead of spinning up a new goroutine here we can collect the filenames in a queue and at the start of controller we can start a "garbage collector goroutine" that periodically pops from the queue and checks if the file was created longer than 5 minutes ago and if so delete it.
@ElvinEfendi @aledbf I think I'm seeing this behavior in nginx-ingress [0].
it can be reproduced like this:
- Create a service that listens to let's say 9500 and expose TCP port 9500
- make the pod go to crashloopback (eg with a command sleep 2s)
- the nginx-ingress-controller will go to status OOMKilled
Shall I create a new issue?
[0]
Release: 0.23.0
Build: git-be1329b22)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these two manifests you can reproduce it:
nginx-ingress-manifest: https://gist.github.com/strigazi/2fd6a4f139b146f11d50f93811de490c
example-noop-server.yaml: https://gist.github.com/strigazi/d1d11f1efe91ce2045bf5d7517659924
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this a a reproducible issue for me, I'll open an new github issue so that other can find.
b338305
to
c951bb2
Compare
internal/ingress/controller/nginx.go
Outdated
return errors.New(oe) | ||
} | ||
|
||
os.Remove(tmpfile.Name()) | ||
return nil | ||
return os.Remove(tmpfile.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behaviour when Nginx config test is successful but for some reason os.Remove
returns an error. And in that case controller will re-enqueue the sync event. IMO the old behaviour was better. We can log a warning or something when os.Remove fails to make things better but as long as config test is successful this function should return nil
(success).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change reverted
t.Fatal(err) | ||
} | ||
|
||
tmpfile, err := ioutil.TempFile("", tempNginxPattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better coverage I'd also create another temp file with creation time less than 5m and assert that it does not get deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c951bb2
to
4baafe1
Compare
t.Fatal(err) | ||
} | ||
|
||
tmpfile, err = ioutil.TempFile("", tempNginxPattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done before calling cleanTempNginxCfg()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixed
4baafe1
to
c830a73
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf, ElvinEfendi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Which issue this PR fixes: fixes #3273