Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #7288: Policy does not get deleted when changing relays #2591

Conversation

VinceMacBuche
Copy link
Member

@VinceMacBuche VinceMacBuche commented Nov 13, 2019

https://issues.rudder.io/issues/7288

We need to update better-files to 3.8.0 because of pathikrit/better-files#279

This was causing fd leak when deleting files

def cleanPromiseFolderOfDeletedNode : Box[Seq[BetterFile]] = {
import BetterFile._
def getNodeFolders(file: BetterFile): List[BetterFile] = {
file.collectChildren(d => d.isDirectory && d.children.exists(_.name == "rules")).flatMap {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really scared by the cost of this search with 10 000 nodes; doing it at each policy generation will slow it down even further

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Myabe it could be a batch that runs, every ... day ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(mean an independant system that runs outside policy generation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost should be really small. It's just collecting a biles names/properties. But OK to add a dedicated timer on that.
It's not logical at all to make it run once in a while if the goal is to limit the timespan where non relevant files are present (if it's just space-consumption control, why not).
And if can be put out of the generation itself, asynchronuously triggered so that it does not delay it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there is no reason file operation like that should be slow when nothing is deleted, and if they are, we need to understand why, because we have a problem)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not use much resource, apart lots of fd in an older version of better-files but fixed by updating it to latest version

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from 7eeb37e to 115eb8a Compare November 14, 2019 10:27
@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from 115eb8a to 5cc44f3 Compare November 14, 2019 12:02
Nil
}
nodeFolder :: childrens
}.toList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will result in "Too many open files" when system is large

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from 5cc44f3 to 054b9c0 Compare November 14, 2019 14:07
@VinceMacBuche
Copy link
Member Author

Commit modified

val restExtractor = restExtractorService

def process0(version: ApiVersion, path: ApiPath, req: Req, params: DefaultParams, authzToken: AuthzToken): LiftResponse = {
apiv11service.reloadTechniques(req, params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be reloadTechnique

val restExtractor = restExtractorService

def process0(version: ApiVersion, path: ApiPath, req: Req, params: DefaultParams, authzToken: AuthzToken): LiftResponse = {
apiv11service.reloadTechniques(req, params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apiv11service.reloadTechniques(req, params)
apiv11service.cleanPromises(req, params)

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from 054b9c0 to b6b4819 Compare November 14, 2019 14:49
@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from b6b4819 to 9cbc03f Compare November 14, 2019 14:51
def cleanPromiseFolderOfDeletedNode(currentNodes : Map[NodeId,NodeInfo]) : Box[Seq[File]] = {
import File._
def getNodeFolders(file: File): Iterator[File] = {
file.children.filter{d => d.name !="rules" && d.isDirectory && d.children.exists{b => b.name == "rules"}}.flatMap {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.nio.file.FileSystemException: /var/rudder/share/546f908c-ef75-492d-bc30-5577fa44099a: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.list(Files.java:3451)
at better.files.File.list(File.scala:764)
at better.files.File.children(File.scala:766)
at com.normation.rudder.batch.CleanPromisesFolder.$anonfun$cleanPromiseFolderOfDeletedNode$1(CleanPromisesFolder.scala:99)
at com.normation.rudder.batch.CleanPromisesFolder.$anonfun$cleanPromiseFolderOfDeletedNode$1$adapted(CleanPromisesFolder.scala:99)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on my test platform I have 4551 folders in /var/rudder/share.
It means file.children has 4551 open files at least, plus for each of them, we look for the children, so again 4551 files opened.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should keep them open thought.
Well, that ticket need perhaps more work to minimize that number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have the error when there are folder to delete (if none are to be deleted, all is fine)
[2019-11-15 09:36:21] ERROR scheduledJob - Error when deleting unreferenced promise directory <- /var/rudder/share/be28c259-8e34-4447-a6fe-d661d9af975f: Too many open files
[2019-11-15 09:36:21] ERROR scheduledJob - Exception was:
java.nio.file.FileSystemException: /var/rudder/share/be28c259-8e34-4447-a6fe-d661d9af975f: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.list(Files.java:3451)
at better.files.File.list(File.scala:764)
at better.files.File.children(File.scala:766)
at com.normation.rudder.batch.CleanPromisesFolder.$anonfun$cleanPromiseFolderOfDeletedNode$1(CleanPromisesFolder.scala:100)
at com.normation.rudder.batch.CleanPromisesFolder.$anonfun$cleanPromiseFolderOfDeletedNode$1$adapted(CleanPromisesFolder.scala:100)

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch 2 times, most recently from c4ddea4 to 39d9523 Compare November 15, 2019 11:27
@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from 39d9523 to 5742956 Compare November 21, 2019 10:07
@VinceMacBuche
Copy link
Member Author

Commit modified

9 similar comments
@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@@ -217,6 +219,7 @@ object RestTestSetUp {
, fakeItemArchiveManager
, fakePersonIndentService
, fakeRepo
, new CleanPoliciesFolder(???, 24.hours)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that may not be what you want to keep :)

@VinceMacBuche
Copy link
Member Author

Commit modified

3 similar comments
@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

@VinceMacBuche
Copy link
Member Author

Commit modified

logger.debug(s"***** starting batch that purge unreferenced policies directory, every ${updateInterval.toString()} *****")
scheduler.scheduleWithFixedDelay(1.hour, updateInterval) {


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to complete the scheduler logic. Also, can you put the code that is also used by API elsewhere, in a "CleanPoliciesFolderService", and only keep the batch logic here ?

# This allows you to schedule the purge of unused promised folders (Deleted Nodes, or Nodes that has moved to another relay/policy server)
#
#
# Defaults: check runs every 24 hours.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garbage collection runs ...


#
# Promise folder cleaning
# This allows you to schedule the purge of unused promised folders (Deleted Nodes, or Nodes that has moved to another relay/policy server)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows you to schedule s/the// purge of unused s/promised/policies folders (s/D/d/eleted s/N/n/odes,
or s/N/n/odes that s/has// moved to another relay or policy server)

#

# Interval in hours
rudder.batch.clean.promises.interval=24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/promises/policies/

@@ -334,6 +334,16 @@ object RudderConfig extends Loggable {
}
}

val RUDDER_BATCH_CLEAN_PROMISES_INTERVAL = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PROMISES/POLICIES/

@fanf
Copy link
Member

fanf commented Nov 28, 2019

This one will go in 5.0.16

@VinceMacBuche
Copy link
Member Author

Commit modified

Fixes #7288: Policy does not get deleted when changing relays
…relays

Fixes #7288: Policy does not get deleted when changing relays
…anging relays

Fixes #7288: Policy does not get deleted when changing relays
…when changing relays

Fixes #7288: Policy does not get deleted when changing relays
…eleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
…t get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
…does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
…Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
…#7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
… Fixes #7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
… fixup! Fixes #7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
… fixup! fixup! Fixes #7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
… fixup! fixup! fixup! Fixes #7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
… fixup! fixup! fixup! fixup! Fixes #7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
@VinceMacBuche VinceMacBuche force-pushed the bug_7288/policy_does_not_get_deleted_when_changing_relays branch from 0fab3f1 to 3f97eff Compare January 14, 2020 23:21
… fixup! fixup! fixup! fixup! fixup! Fixes #7288: Policy does not get deleted when changing relays

Fixes #7288: Policy does not get deleted when changing relays
@VinceMacBuche
Copy link
Member Author

Commit modified

Copy link
Member

@fanf fanf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok to me. @ncharles do you want to test it a last time on the load server?

@ncharles
Copy link
Member

this still fails with

{"action":"cleanPolicies","result":"error","errorDetails":"An error while cleaning policies folder through rest API <- Error when deleting unreferenced policies directory <- Too many files open (currently -1, but number may have dropped), maybe you can raise open file descriptor limit (ulimit -n)"}

@fanf
Copy link
Member

fanf commented May 11, 2022

This PR is now several years old. I'm closing it, there's not need to keep it open with no changes for so long time.

@fanf fanf closed this May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants