Create a pre fork()'ed data manager to store scan data information #274

jjnicola · 2020-05-18T10:43:28Z

When a new scan is started, it will not be started directly, but the data information will be stored in the scan table in data manager. The scan is set as PENDING (new scan status added with this PR).
A new method call from inside the main loop will check for a PENDING scan and will launch the scan in a new fork()'ed process.

This avoid to fork the new scan process with all the data objects inherited from the parent process.
This reduces the memory usage of the new process (data manager and task processes) from ~110MB to ~30MB, for a full and fast single host target scan.

It is started at the beginning and avoid to inherit unnecessary data.

Otherwise, if it is a scan against a real target, the scan information is stored in the already forked data manager. This scan will be launched later by the scheduler. This avoid to fork the new scan process with all the data objects inherited from the parent process (stream, data, etree object, etc). At this moment, this reduces the memory usage of the new process from 110MB to ~30MB, for a full and fast single host target scan.

The scheduler check for scans with this new status to launch the scan.

The main loop in the parent process check for pending scans in the scan table. It will fork a new scan process for the task. The scan data was already stored in the data manager in a previous step (during handling of start_scan cmd). Therefore the fork()'ed child only inherit the base memory of the parent process.

Pending scans are still not in the scan process table, still not started. Therefore, they are skipped.

The port list is not used anymore once the scan was started. So, it is cleaned up from the data manager and it reduce the footprint during a scan.

The vts list is not used anymore once the scan was started. So, it is cleaned up from the data manager and it reduce the memory usage during a scan.

codecov · 2020-05-18T15:53:33Z

Codecov Report

Merging #274 into master will increase coverage by 0.04%.
The diff coverage is 80.95%.

@@            Coverage Diff             @@
##           master     #274      +/-   ##
==========================================
+ Coverage   74.51%   74.56%   +0.04%     
==========================================
  Files          21       21              
  Lines        2319     2335      +16     
==========================================
+ Hits         1728     1741      +13     
- Misses        591      594       +3

Impacted Files	Coverage Δ
ospd/command/command.py	`85.67% <42.85%> (-0.84%)`	⬇️
ospd/ospd.py	`77.15% <81.25%> (+0.17%)`	⬆️
ospd/scan.py	`92.13% <94.73%> (+1.22%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7458307...59fdfc4. Read the comment docs.

ospd/command/command.py

bjoernricks · 2020-05-19T06:04:38Z

ospd/ospd.py

@@ -1174,15 +1174,32 @@ def run(self) -> None:
        """ Starts the Daemon, handling commands until interrupted.
        """

+        self.scan_collection.data_manager = multiprocessing.Manager()


I really don't think this line belongs into the daemon class. I would call the data_manager an internal thing of the scan_collection which should not be accessed by the outside. Either create the manager when the scan_collection is created (constructor) or add some init method which is called later.

bjoernricks · 2020-05-19T06:05:46Z

ospd/ospd.py

                self.wait_for_children()
        except KeyboardInterrupt:
            logger.info("Received Ctrl-C shutting-down ...")

+    def check_pending_scans(self):
+        for scan_id in list(self.scan_collection.ids_iterator()):


Why are you converting the iterator to a list?

ospd/ospd.py

bjoernricks · 2020-05-19T06:12:25Z

ospd/ospd.py

                self.wait_for_children()
        except KeyboardInterrupt:
            logger.info("Received Ctrl-C shutting-down ...")

+    def check_pending_scans(self):


If I read the method correctly the name should be changed because it doesn't check something for validity instead it created processes for pending scans. So better would be something like start_pending_scans

Adjust tests as well

Also initialize it before starting the server

bjoernricks · 2020-05-19T08:37:11Z

ospd/scan.py

@@ -63,6 +64,9 @@ def __init__(self) -> None:
        )  # type: Optional[multiprocessing.managers.SyncManager]
        self.scans_table = dict()  # type: Dict

+    def init_data_manager(self):


Personally I wont call it init_data_manager because the data manager should be an internal detail of the class. If we are using a data manager or if we are storing the data in a dict, db, redis, memory, json, ... should not matter for the outside. For the outside using code it must only be obvious that a init function has to be called after the instance has been created.

jjnicola added 9 commits May 18, 2020 11:23

Pre-fork() the data manager.

9ddbb7e

It is started at the beginning and avoid to inherit unnecessary data.

Add new scan status PENDING

7586a9d

Set the initial status as PENDING.

f1feae1

The scheduler check for scans with this new status to launch the scan.

Only check scan progress of already started scans

766644b

Pending scans are still not in the scan process table, still not started. Therefore, they are skipped.

Pop the port list from the scan table once the scan was started.

c2c54a1

The port list is not used anymore once the scan was started. So, it is cleaned up from the data manager and it reduce the footprint during a scan.

Pop the vts list from the scan table once the scan was started.

e01529e

The vts list is not used anymore once the scan was started. So, it is cleaned up from the data manager and it reduce the memory usage during a scan.

Don't delete the pre-fork()'ed data manager.

33c0a2c

jjnicola force-pushed the prefork branch 2 times, most recently from 9850463 to c58c6bb Compare May 18, 2020 15:51

jjnicola marked this pull request as ready for review May 18, 2020 15:55

jjnicola requested a review from bjoernricks as a code owner May 18, 2020 15:55

jjnicola force-pushed the prefork branch from c58c6bb to d8e73e9 Compare May 18, 2020 16:18

Update tests

077e3ab

jjnicola force-pushed the prefork branch from d8e73e9 to 077e3ab Compare May 18, 2020 16:23

bjoernricks suggested changes May 19, 2020

View reviewed changes

Don't cast the iterator to list

9622136

jjnicola force-pushed the prefork branch from 64a88fc to 9622136 Compare May 19, 2020 07:38

jjnicola added 2 commits May 19, 2020 09:39

Rename check_pending_scans() to start_pending_scans()

d84e40c

Adjust tests as well

Add method to initialize the data manager.

156dca3

Also initialize it before starting the server

bjoernricks reviewed May 19, 2020

View reviewed changes

jjnicola added 2 commits May 19, 2020 10:43

Fix dry_run_scan()

01b1496

Rename scan collection init method

59fdfc4

jjnicola requested a review from bjoernricks May 19, 2020 10:41

bjoernricks approved these changes May 19, 2020

View reviewed changes

jjnicola merged commit 54fe4da into greenbone:master May 19, 2020

jjnicola deleted the prefork branch May 19, 2020 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a pre fork()'ed data manager to store scan data information #274

Create a pre fork()'ed data manager to store scan data information #274

jjnicola commented May 18, 2020

codecov bot commented May 18, 2020 •

edited

Loading

bjoernricks May 19, 2020

bjoernricks May 19, 2020

bjoernricks May 19, 2020

bjoernricks May 19, 2020

Create a pre fork()'ed data manager to store scan data information #274

Create a pre fork()'ed data manager to store scan data information #274

Conversation

jjnicola commented May 18, 2020

codecov bot commented May 18, 2020 • edited Loading

Codecov Report

bjoernricks May 19, 2020

Choose a reason for hiding this comment

bjoernricks May 19, 2020

Choose a reason for hiding this comment

bjoernricks May 19, 2020

Choose a reason for hiding this comment

bjoernricks May 19, 2020

Choose a reason for hiding this comment

codecov bot commented May 18, 2020 •

edited

Loading