Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage with new forking parser cache #7503

Closed
2 of 3 tasks
andresriancho opened this issue Jan 13, 2015 · 20 comments
Closed
2 of 3 tasks

High memory usage with new forking parser cache #7503

andresriancho opened this issue Jan 13, 2015 · 20 comments

Comments

@andresriancho
Copy link
Owner

Source code

https://github.com/andresriancho/w3af/blob/develop/w3af/core/data/parsers/parser_cache.py#L81
https://github.com/andresriancho/w3af/blob/develop/w3af/core/data/parsers/parser_cache.py#L161

Description

High memory usage with new forking parser cache, this is appearing after several minutes of running w3af using collector:

Exception in thread Thread-15:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 326, in _handle_workers
    pool._maintain_pool()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 230, in _maintain_pool
    self._repopulate_pool()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
    w.start()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
/home/ubuntu/run_w3af.sh: line 64: 10302 Killed                  ./w3af_console -s /tmp/test-script.w3af

Fatal error: sudo() received nonzero return code 137 while executing!

Requested: /home/ubuntu/run_w3af.sh
Executed: sudo -S -p 'sudo password:'  /bin/bash -l -c "export VERSION=\"da7afe3\" && /home/ubuntu/run_w3af.sh"

Aborting.
A remote error was found.

Tasks

  • Change the collector to be able to collect information even when the run_w3af.sh command fails
  • Add a profiling plugin to w3af that counts how many sub-processes it has, and if possible, their memory usage (real, remember the copy on write)
  • Identify the where the memory is being used
@andresriancho andresriancho self-assigned this Jan 13, 2015
@andresriancho andresriancho added this to the 1.6.1 - Bug fixing after 1.6 milestone Jan 13, 2015
@andresriancho
Copy link
Owner Author

Maybe add this code to the memory_usage.py file, and have it save the data to a .os_memory file?
https://raw.githubusercontent.com/pixelb/ps_mem/master/ps_mem.py

Before I do something like that, remember tmp/w3af-psutil.data , which might already contain something like it. Also remember that it might be a good idea to integrate this into the collector instead of w3af

{
    "CPU": {
        "guest": 0.0, 
        "guest_nice": 0.0, 
        "idle": 46.05, 
        "iowait": 43.59, 
        "irq": 4.19, 
        "nice": 4.32, 
        "softirq": 0.0, 
        "steal": 263.34, 
        "system": 121.42, 
        "user": 390.76
    }, 
    "Load average": [
        0.89, 
        0.98, 
        0.6
    ], 
    "Network": {
        "docker0": {
            "bytes_recv": 0, 
            "bytes_sent": 0, 
            "dropin": 0, 
            "dropout": 0, 
            "errin": 0, 
            "errout": 0, 
            "packets_recv": 0, 
            "packets_sent": 0
        }, 
        "eth0": {
            "bytes_recv": 265032109, 
            "bytes_sent": 3876973, 
            "dropin": 0, 
            "dropout": 0, 
            "errin": 0, 
            "errout": 0, 
            "packets_recv": 180723, 
            "packets_sent": 48957
        }, 
        "lo": {
            "bytes_recv": 0, 
            "bytes_sent": 0, 
            "dropin": 0, 
            "dropout": 0, 
            "errin": 0, 
            "errout": 0, 
            "packets_recv": 0, 
            "packets_sent": 0
        }
    }, 
    "Swap memory": {
        "free": 939520000, 
        "percent": 0.0, 
        "sin": 0, 
        "sout": 0, 
        "total": 939520000, 
        "used": 0
    }, 
    "Virtual memory": {
        "active": 533528576, 
        "available": 1616830464, 
        "buffers": 35323904, 
        "cached": 938897408, 
        "free": 642609152, 
        "inactive": 461950976, 
        "percent": 6.6, 
        "total": 1731534848, 
        "used": 1088925696
    }
}```

The psutil data might not be enough, one of the problems is that it's only retrieved once

@andresriancho
Copy link
Owner Author

Using andresriancho/w3af-performance-analysis@198c0e0 it's possible to get this output:

PSUtils memory usage summary
    0
        10351: 1.68% - SyncManager-1._run_server
        10334: 5.54% - (unknown)

    1
        10351: 1.72% - SyncManager-1._run_server
        12070: 11.85% - SubDaemonPoolWorker-7.worker
        10334: 12.78% - (unknown)

    2
        10351: 1.74% - SyncManager-1._run_server
        10334: 26.40% - (unknown)

    3
        10351: 1.74% - SyncManager-1._run_server
        10334: 35.56% - (unknown)
        18390: 33.96% - (unknown)

    4
        10351: 1.74% - SyncManager-1._run_server
        10334: 44.82% - (unknown)

    5
        10351: 1.74% - SyncManager-1._run_server
        10334: 50.51% - (unknown)

    6
        10351: 1.74% - SyncManager-1._run_server
        10334: 55.65% - (unknown)
        25349: 55.00% - (unknown)

    7
        10351: 1.74% - SyncManager-1._run_server
        10334: 61.72% - (unknown)

    8
        10351: 1.63% - SyncManager-1._run_server
        10334: 68.65% - (unknown)

    9
        30818: 68.41% - (unknown)
        10351: 0.37% - SyncManager-1._run_server
        10334: 68.87% - (unknown)

    10
        10351: 0.35% - SyncManager-1._run_server
        10334: 66.70% - (unknown)
        31164: 66.07% - SubDaemonPoolWorker-85.worker

    11
        10351: 0.27% - SyncManager-1._run_server
        10334: 70.25% - (unknown)
        31164: 41.74% - SubDaemonPoolWorker-85.worker

    12
        10351: 0.24% - SyncManager-1._run_server
        10334: 77.76% - (unknown)
        1001: 76.95% - SubDaemonPoolWorker-86.worker

    13
        10351: 0.24% - SyncManager-1._run_server
        10334: 77.96% - (unknown)

    14
        10351: 0.15% - SyncManager-1._run_server
        10334: 75.20% - (unknown)

    15
        10351: 0.14% - SyncManager-1._run_server
        10334: 77.95% - (unknown)
        2685: 77.10% - SubDaemonPoolWorker-89.worker

    16
        10351: 0.15% - SyncManager-1._run_server
        10334: 78.67% - (unknown)

    17
        3328: 80.97% - SubDaemonPoolWorker-91.worker
        10351: 0.13% - SyncManager-1._run_server
        10334: 82.75% - (unknown)

    18
        10351: 0.14% - SyncManager-1._run_server
        10334: 80.95% - (unknown)

    19
        10334: 82.42% - (unknown)
        10351: 0.16% - SyncManager-1._run_server

    20
        10334: 82.94% - (unknown)
        10351: 0.16% - SyncManager-1._run_server

    21
        10334: 84.02% - (unknown)
        10351: 0.16% - SyncManager-1._run_server

    22
        10334: 86.08% - (unknown)
        10351: 0.14% - SyncManager-1._run_server

Which shows that:

  • 10334: 70.25% - (unknown) is the main w3af process, using considerable amount of RAM
  • 10351: 0.27% - SyncManager-1._run_server is the self._processes = manager.dict() which doesn't use much RAM (expected but good to confirm)
  • 31164: 41.74% - SubDaemonPoolWorker-85.worker is the parser worker process
  • There are at most three processes running at any time, which is good/expected but's something that I was worried about: zombie processes are not present.

I would expect the 31164 pid to use less RAM

In most of the data tuples above we see that the main process uses X% RAM and the SubDaemonPoolWorker uses X-1%, which seems strange and makes me think that either:

  • os.fork() is not working as expected and actually copying all the process memory into the child
  • I'm measuring memory usage in a suboptimal way, which leads to the shared memory pages being counted

@andresriancho
Copy link
Owner Author

4255a32 will help understand the shared memory usage in subprocesses

@andresriancho
Copy link
Owner Author

Need to run ./collector examples/w3af/config-lock.yml 4255a32

@andresriancho
Copy link
Owner Author

This is really bad... not much memory is shared betweek parent (10327) and child (11418):

    1
        10327 - (unknown)
            RSS (total): 208.6 MB
            Shared: 7.6 MB
            Percent OS used: 12.05 %

        11418 - SubDaemonPoolWorker-5.worker
            RSS (total): 169.7 MB
            Shared: 2.6 MB
            Percent OS used: 9.80 %

        10344 - SyncManager-1._run_server
            RSS (total): 29.7 MB
            Shared: 1.7 MB
            Percent OS used: 1.71 %

Is this the right type of shared memory I'm looking for?

@andresriancho
Copy link
Owner Author

TODO for debugging, add the profiling tools to this function, so we can better track what happens inside the sub-process:

def init_worker(log_queue):

@andresriancho
Copy link
Owner Author

Looks like this might be useful:
http://pytracemalloc.readthedocs.org/install.html

@andresriancho
Copy link
Owner Author

Does this mean that the worker is sending http traffic? What's the traceback for these calls to GET, _send_404, etc?

./wpa ~/performance_info/012d55a/i-6dd0c896/tmp/ 13959 -3

Performance analysis
====================

CPU usage by function
    Measurement #0 (w3af-13870-2015-27-01-03_00.cpu)
        1: extended_urllib.py:305 (ExtendedUrllib.GET)
        2: extended_urllib.py:491 (ExtendedUrllib._send)
        3: urllib2.py:386 (OpenerDirector.open)
        4: decorators.py:80 (f_retry)
        5: fingerprint_404.py:175 (fingerprint_404._send_404)
        6: threading.py:308 (_Condition.wait)
        7: pool.py:226 (Pool._maintain_pool)
        8: pool.py:195 (Pool._join_exited_workers)
        9: Queue.py:150 (Queue.get)
        10: HTTPResponse.py:129 (from_httplib_resp)
        11: urllib2.py:374 (OpenerDirector._call_chain)
        12: urllib2.py:414 (OpenerDirector._open)
        13: strategy.py:293 (w3af_core_strategy.verify_target_server)
        14: __init__.py:77 (DaemonProcess.exitcode)
        15: keepalive.py:782 (HTTPHandler.http_open)

    Measurement #1 (w3af-13959-2015-27-01-03_01.cpu)
        1: parser_cache.py:279 (init_worker)
        2: extended_urllib.py:305 (ExtendedUrllib.GET)
        3: extended_urllib.py:491 (ExtendedUrllib._send)
        4: urllib2.py:386 (OpenerDirector.open)
        5: decorators.py:22 (return_exceptions_wrapper)
        6: decorators.py:4 (apply_with_return_error)
        7: parser_cache.py:267 (ProcessDocumentParser.__init__)
        8: document_parser.py:39 (ProcessDocumentParser.__init__)
        9: html.py:45 (HTMLParser.__init__)
        10: sgml.py:65 (HTMLParser.__init__)
        11: sgml.py:140 (HTMLParser._parse)
        12: decorators.py:80 (f_retry)
        13: fingerprint_404.py:175 (fingerprint_404._send_404)
        14: __init__.py:122 (dump)
        15: url.py:175 (URL.__init__)

@andresriancho
Copy link
Owner Author

This issue of the sub-process running "all w3af" looks like http://stackoverflow.com/a/7994465/1347554 , but that can't be...

@andresriancho
Copy link
Owner Author

@andresriancho
Copy link
Owner Author

https://docs.python.org/3.4/library/multiprocessing.html#contexts-and-start-methods
"Note that safely forking a multithreaded process is problematic."

@andresriancho
Copy link
Owner Author

Latest billiard, the one from the repo (not pypi), is a backport of the 3.4 multiprocessing module for 2.7
https://github.com/celery/billiard

I want to test that with spawn as https://docs.python.org/3.4/library/multiprocessing.html#multiprocessing.set_start_method

@andresriancho
Copy link
Owner Author

Something I wanted to play with for a while... but not sure if it's the best thing in this scenario: http://zerorpc.dotcloud.com/

With timeouts:

In [1]: import zerorpc
In [2]: c = zerorpc.Client()
In [3]: c.connect("tcp://127.0.0.1:4242")
Out[3]: [None]
In [6]: c._timeout
Out[6]: 30
In [9]: c.add_42(1)
Out[9]: 43
In [14]: r = c.add_42(1, async=True)
In [15]: r
Out[15]: <gevent.event.AsyncResult at 0x7f1910006450>
In [16]: r.get()
Out[16]: 43
In [17]: r.get()
Out[17]: 43

#
# Sleep and timeout
#
In [20]: r = c.sleep(15, async=True)
In [21]: r.get(timeout=1)
---------------------------------------------------------------------------
Timeout                                   Traceback (most recent call last)
...
Timeout: 1 second


#
# When there is a timeout the task keeps running, and at some point it finishes
#
In [22]: r.get(timeout=10)
Out[22]: 'slept 15'


#
# Timeout without async
#
In [23]: c.sleep(15, timeout=1)
---------------------------------------------------------------------------
TimeoutExpired                            Traceback (most recent call last)
...
TimeoutExpired: timeout after 1s, when calling remote method sleep

And the server

import time

class Cooler(object):
    """ Various convenience methods to make things cooler. """

    def add_man(self, sentence):
        """ End a sentence with ", man!" to make it sound cooler, and
        return the result. """
        return sentence + ", man!"

    def add_42(self, n):
        """ Add 42 to an integer argument to make it cooler, and return the
        result. """
        return n + 42

    def boat(self, sentence):
        """ Replace a sentence with "I'm on a boat!", and return that,
        because it's cooler. """
        return "I'm on a boat!"

    def sleep(self, num):
        time.sleep(num)
        return 'slept %s' % num

import zerorpc

s = zerorpc.Server(Cooler())
s.bind("tcp://0.0.0.0:4242")
s.run()

I could easily start the zerorpc server in a different process using Popen and make it listen on a unix socket --bind ipc:///tmp/randomly-generated-id.w3af (r+w only for the current user). Then I would call the document parser through zerorpc. There are some questions which remain open:

  • I can timeout the "get()", but that doesn't kill the running remote function. Is there a way to do that without killing the whole zerorpc server?
  • What happens when multiple threads want to consume the remote service? Are the requests queued somewhere? In memory? Does execution block?
  • Does the remote service run multiple threads?

The document parser zerorpc-server could have a signal handler for Ctrl+C which would stop the server, kill any running tasks, cleanup, etc. and end the process. From the main process we could send the signal when we need to kill the sub-process.

Instead of having only one document parser server I could use zeroworkers to have N workers (like I have now with the pool) and one task generator (the main process)

@andresriancho
Copy link
Owner Author

https://pypi.python.org/pypi/Pyro4 looks good:

  • Better than zerorpc (which seems to be abandoned)
  • It's pure-python and has only one (also pure python) dependency
  • MIT licensed, which is GPL compatible
  • Has support for remote tracebacks
  • Has support for futures with timeouts (wait method of futures)

@andresriancho
Copy link
Owner Author

http://rpyc.readthedocs.org/ also looks good

  • Better than zerorpc (which seems to be abandoned)
  • Active mailing list
  • Active development
  • Async/timed/wait
  • Has support for remote tracebacks
  • It's pure-python and has only one (also pure python) dependency

@andresriancho
Copy link
Owner Author

Maybe the sub-processes could be started/stopped using something similar to https://pypi.python.org/pypi/python-daemon ? (source https://www.python.org/dev/peps/pep-3143/)

@andresriancho
Copy link
Owner Author

I also need to create a new AMI for launching the collector since with the latest tools it's really slow to boot. andresriancho/collector#10

@andresriancho
Copy link
Owner Author

https://circleci.com/gh/andresriancho/w3af/1428 yields the message:

Warning: The build VMs have a memory limit of 4G. Your build hit this limit
on one or more containers, and your build results are likely invalid.

This was not related to the parsers/subprocesses, it was because of a memory leak in lxml

@andresriancho
Copy link
Owner Author

Solved in develop branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant