Skip to content
This repository has been archived by the owner on Nov 29, 2021. It is now read-only.

Create a pre fork()'ed data manager to store scan data information #274

Merged
merged 15 commits into from
May 19, 2020

Conversation

jjnicola
Copy link
Member

When a new scan is started, it will not be started directly, but the data information will be stored in the scan table in data manager. The scan is set as PENDING (new scan status added with this PR).
A new method call from inside the main loop will check for a PENDING scan and will launch the scan in a new fork()'ed process.

This avoid to fork the new scan process with all the data objects inherited from the parent process.
This reduces the memory usage of the new process (data manager and task processes) from ~110MB to ~30MB, for a full and fast single host target scan.

It is started at the beginning and avoid to inherit unnecessary data.
Otherwise, if it is a scan against a real target, the scan information
is stored in the already forked data manager. This scan will be launched
later by the scheduler.
This avoid to fork the new scan process with all the data objects inherited
from the parent process (stream, data, etree object, etc).
At this moment, this reduces the memory usage of the new process
from 110MB to ~30MB, for a full and fast single host target scan.
The scheduler check for scans with this new status to launch the scan.
The main loop in the parent process check for pending scans in the
scan table. It will fork a new scan process for the task.
The scan data was already stored in the data manager in a previous
step (during handling of start_scan cmd). Therefore the fork()'ed
child only inherit the base memory of the parent process.
Pending scans are still not in the scan process table, still not started.
Therefore, they are skipped.
The port list is not used anymore once the scan was started.
So, it is cleaned up from the data manager and it reduce the footprint
during a scan.
The vts list is not used anymore once the scan was started.
So, it is cleaned up from the data manager and it reduce the memory usage
during a scan.
@jjnicola jjnicola force-pushed the prefork branch 2 times, most recently from 9850463 to c58c6bb Compare May 18, 2020 15:51
@codecov
Copy link

codecov bot commented May 18, 2020

Codecov Report

Merging #274 into master will increase coverage by 0.04%.
The diff coverage is 80.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #274      +/-   ##
==========================================
+ Coverage   74.51%   74.56%   +0.04%     
==========================================
  Files          21       21              
  Lines        2319     2335      +16     
==========================================
+ Hits         1728     1741      +13     
- Misses        591      594       +3     
Impacted Files Coverage Δ
ospd/command/command.py 85.67% <42.85%> (-0.84%) ⬇️
ospd/ospd.py 77.15% <81.25%> (+0.17%) ⬆️
ospd/scan.py 92.13% <94.73%> (+1.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7458307...59fdfc4. Read the comment docs.

@jjnicola jjnicola marked this pull request as ready for review May 18, 2020 15:55
ospd/command/command.py Show resolved Hide resolved
ospd/ospd.py Outdated
@@ -1174,15 +1174,32 @@ def run(self) -> None:
""" Starts the Daemon, handling commands until interrupted.
"""

self.scan_collection.data_manager = multiprocessing.Manager()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think this line belongs into the daemon class. I would call the data_manager an internal thing of the scan_collection which should not be accessed by the outside. Either create the manager when the scan_collection is created (constructor) or add some init method which is called later.

ospd/ospd.py Outdated
self.wait_for_children()
except KeyboardInterrupt:
logger.info("Received Ctrl-C shutting-down ...")

def check_pending_scans(self):
for scan_id in list(self.scan_collection.ids_iterator()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you converting the iterator to a list?

ospd/ospd.py Show resolved Hide resolved
ospd/ospd.py Outdated
self.wait_for_children()
except KeyboardInterrupt:
logger.info("Received Ctrl-C shutting-down ...")

def check_pending_scans(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the method correctly the name should be changed because it doesn't check something for validity instead it created processes for pending scans. So better would be something like start_pending_scans

Also initialize it before starting the server
ospd/scan.py Outdated
@@ -63,6 +64,9 @@ def __init__(self) -> None:
) # type: Optional[multiprocessing.managers.SyncManager]
self.scans_table = dict() # type: Dict

def init_data_manager(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I wont call it init_data_manager because the data manager should be an internal detail of the class. If we are using a data manager or if we are storing the data in a dict, db, redis, memory, json, ... should not matter for the outside. For the outside using code it must only be obvious that a init function has to be called after the instance has been created.

@jjnicola jjnicola requested a review from bjoernricks May 19, 2020 10:41
@jjnicola jjnicola merged commit 54fe4da into greenbone:master May 19, 2020
@jjnicola jjnicola deleted the prefork branch May 19, 2020 11:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants