-
Notifications
You must be signed in to change notification settings - Fork 36
Create a pre fork()'ed data manager to store scan data information #274
Conversation
It is started at the beginning and avoid to inherit unnecessary data.
Otherwise, if it is a scan against a real target, the scan information is stored in the already forked data manager. This scan will be launched later by the scheduler. This avoid to fork the new scan process with all the data objects inherited from the parent process (stream, data, etree object, etc). At this moment, this reduces the memory usage of the new process from 110MB to ~30MB, for a full and fast single host target scan.
The scheduler check for scans with this new status to launch the scan.
The main loop in the parent process check for pending scans in the scan table. It will fork a new scan process for the task. The scan data was already stored in the data manager in a previous step (during handling of start_scan cmd). Therefore the fork()'ed child only inherit the base memory of the parent process.
Pending scans are still not in the scan process table, still not started. Therefore, they are skipped.
The port list is not used anymore once the scan was started. So, it is cleaned up from the data manager and it reduce the footprint during a scan.
The vts list is not used anymore once the scan was started. So, it is cleaned up from the data manager and it reduce the memory usage during a scan.
9850463
to
c58c6bb
Compare
Codecov Report
@@ Coverage Diff @@
## master #274 +/- ##
==========================================
+ Coverage 74.51% 74.56% +0.04%
==========================================
Files 21 21
Lines 2319 2335 +16
==========================================
+ Hits 1728 1741 +13
- Misses 591 594 +3
Continue to review full report at Codecov.
|
ospd/ospd.py
Outdated
@@ -1174,15 +1174,32 @@ def run(self) -> None: | |||
""" Starts the Daemon, handling commands until interrupted. | |||
""" | |||
|
|||
self.scan_collection.data_manager = multiprocessing.Manager() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think this line belongs into the daemon class. I would call the data_manager an internal thing of the scan_collection which should not be accessed by the outside. Either create the manager when the scan_collection is created (constructor) or add some init method which is called later.
ospd/ospd.py
Outdated
self.wait_for_children() | ||
except KeyboardInterrupt: | ||
logger.info("Received Ctrl-C shutting-down ...") | ||
|
||
def check_pending_scans(self): | ||
for scan_id in list(self.scan_collection.ids_iterator()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you converting the iterator to a list?
ospd/ospd.py
Outdated
self.wait_for_children() | ||
except KeyboardInterrupt: | ||
logger.info("Received Ctrl-C shutting-down ...") | ||
|
||
def check_pending_scans(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read the method correctly the name should be changed because it doesn't check something for validity instead it created processes for pending scans. So better would be something like start_pending_scans
Adjust tests as well
Also initialize it before starting the server
ospd/scan.py
Outdated
@@ -63,6 +64,9 @@ def __init__(self) -> None: | |||
) # type: Optional[multiprocessing.managers.SyncManager] | |||
self.scans_table = dict() # type: Dict | |||
|
|||
def init_data_manager(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I wont call it init_data_manager because the data manager should be an internal detail of the class. If we are using a data manager or if we are storing the data in a dict, db, redis, memory, json, ... should not matter for the outside. For the outside using code it must only be obvious that a init
function has to be called after the instance has been created.
When a new scan is started, it will not be started directly, but the data information will be stored in the scan table in data manager. The scan is set as PENDING (new scan status added with this PR).
A new method call from inside the main loop will check for a PENDING scan and will launch the scan in a new fork()'ed process.
This avoid to fork the new scan process with all the data objects inherited from the parent process.
This reduces the memory usage of the new process (data manager and task processes) from ~110MB to ~30MB, for a full and fast single host target scan.