Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage #9626

Closed
andresriancho opened this issue Apr 25, 2015 · 21 comments
Closed

High memory usage #9626

andresriancho opened this issue Apr 25, 2015 · 21 comments

Comments

@andresriancho
Copy link
Owner

User story

As a user I'm scanning a site and after some time w3af uses so much memory that the whole system becomes unusable

Affected version

Master @ 1.6.51
de56135
https://github.com/andresriancho/w3af/releases/tag/1.6.51

How to reproduce

  • Site to scan: ecommerce.shopify.com
  • Running w3af with: /opt/w3af/w3af_console -s /tmp/script.w3af
  • Amazon Small EC2 (2GB 1CPU) with Ubuntu
  • w3af script
http-settings
set timeout 30
back

#Configure scanner global behaviors
misc-settings
set max_discovery_time 5
set fuzz_cookies True
set fuzz_form_files True
back

plugins
#Configure entry point (CRAWLING) scanner
crawl robots_txt
crawl web_spider
crawl config web_spider
set only_forward True
set ignore_regex (?i)(logout|disconnect|signout|exit)+
back

#Configure vulnerability scanners
##Specify list of AUDIT plugins type to use
audit blind_sqli, buffer_overflow, cors_origin, csrf, eval, file_upload, ldapi, lfi, os_commanding, phishing_vector, redos, response_splitting, sqli, xpath, xss, xst
##Customize behavior of each audit plugin when needed
audit config file_upload
set extensions jsp,php,php2,php3,php4,php5,asp,aspx,pl,cfm,rb,py,sh,ksh,csh,bat,ps,exe
back

##Specify list of GREP plugins type to use (grep plugin is a type of plugin that can find also vulnerabilities or informations
grep analyze_cookies, click_jacking, code_disclosure, cross_domain_js, csp, directory_indexing, dom_xss, error_500, error_pages,html_comments, objects, path_disclosure, private_ip, strange_headers, strange_http_codes, strange_parameters, strange_reason, url_session, xss_protection_header

##Specify list of INFRASTRUCTURE plugins type to use (infrastructure plugin is a type of plugin that can find informations disclosure)
infrastructure server_header, server_status, domain_dot, dot_net_errors

#Configure target authentication
#back
#Configure reporting in order to generate an HTML report
output console, html_file, csv_file
output config html_file
set output_file /output/W3afReport.html
set verbose False
back
output config csv_file
set output_file /output/W3afReport.csv
back
output config console
set verbose False
back
back
#Set target informations, do a cleanup and run the scan

target
set target ecommerce.shopify.com
#set target_os windows
#set target_framework php
back

cleanup
start
exit

Docker container to reproduce the issue

Docker container at https://registry.hub.docker.com/u/89berner/w3af/ , reproduce the issue using:

docker pull 89berner/w3af:v1
sudo docker run -t -i 89berner/w3af:v1 /bin/bash
/opt/start.sh ecommerce.shopify.com

Dockerfile used to create 89berner/w3af:v1

FROM ubuntu:12.04
MAINTAINER Juan Berner <[email protected]>

# Initial setup
# Squash errors about "Falling back to ..." during package installation

ENV TERM linux
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# Update before installing any package
RUN apt-get update -y
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y

# Install basic and GUI requirements, python-lxml because it doesn't compile correctly from pip
RUN apt-get install -y python-pip build-essential libxslt1-dev libxml2-dev libsqlite3-dev libyaml-dev openssh-server python-dev git python-lxml wget libssl-dev xdot ubuntu-artwork dmz-cursor-theme ca-certificates
RUN pip install --upgrade pip
RUN apt-get install -y libffi-dev curl

RUN pip install clamd==1.0.1 PyGithub==1.21.0 GitPython==0.3.2.RC1 pybloomfiltermmap==0.3.11 \
        esmre==0.3.1 phply==0.9.1 stopit==1.1.0 nltk==2.0.5 chardet==2.1.1 pdfminer==20140328 \
        futures==2.1.5 pyOpenSSL==0.13.1 scapy-real==2.2.0-dev guess-language==0.2 cluster==1.1.1b3 \
        msgpack-python==0.4.4 python-ntlm==1.0.1 halberd==0.2.4 darts.util.lru==0.5 \
        tblib==0.2.0 ndg-httpsclient==0.3.3 pyasn1==0.1.7

RUN pip install nltk==3.0.1 pyasn1==0.1.3 Jinja2==2.7.3 vulndb==0.0.17 markdown==2.6.1

EXPOSE 22

RUN cd /opt/ && git clone https://github.com/andresriancho/w3af.git && cd /opt/w3af/
#RUN echo "Y" | /opt/w3af/w3af_console

ADD ./start.sh /opt/start.sh 
RUN chmod 777 /opt/start.sh && mkdir -p /var/run/sshd && chmod 0755 /var/run/sshd
CMD ["/usr/sbin/sshd", "-D"]

Reporter

@89berner reported this issue via email

Related issues

@andresriancho
Copy link
Owner Author

Some comments while testing:

  • I'm seeing a constant/slow increase in memory usage using htop
  • I've reduced the test script to:
http-settings
set timeout 30
back

misc-settings
set max_discovery_time 5
set fuzz_cookies True
set fuzz_form_files True
back

plugins
crawl robots_txt
crawl web_spider
crawl config web_spider
set only_forward True
set ignore_regex (?i)(logout|disconnect|signout|exit)+
back

back
target
set target ecommerce.shopify.com
back

start

And the memory usage still increases.

  • With all the plugins enabled the memory increases faster (confirmed by letting the scan run for the same time with htop monitoring)
  • Reaching 200mb memory usage really quickly with all plugins enabled

@89berner
Copy link

How much time since the start of the scan do you see an increase in memory?

Around 10 to 15 minutes depending on the site.

Is the increase sudden, or memory usage increases constantly over time?

The memory usage has a linear increase until it reaches the maximum amount of memory.

Have you tried to reduce the number of enabled plugins to try to identify which set reproduces the issue?

Yes, the original configuration had grep all, infrastructure all, audit all and crawl with "find_dvcs, ghdb, google_spider, phpinfo, sitemap_xml, url_fuzzer, pykto"

@andresriancho
Copy link
Owner Author

Thanks for your comments, it's aligned with what I'm seeing.

@andresriancho
Copy link
Owner Author

TODO

  • Replace PIPDependency('lxml', 'lxml', '2.3.2'), with the latest lxml, just to test if that changes things

@andresriancho
Copy link
Owner Author

Complete scan with different lxml versions (as sent by Juan)

Old lxml version: 2.3.2
    3min        244 MB used
    5min        309 MB used
    7min        363 MB used

Latest lxml 3.4.4
    3min        151 MB used 
    5min        165 MB used
    7min        194 MB used

@andresriancho
Copy link
Owner Author

At minute 5 we have ~ half the memory usage, which is very significant. I'm changing the version in requirements.py to the latest 3.4.4 but the current version of Kali still uses 2.3.2 and won't benefit from the change http://pkg.kali.org/pkg/lxml , the next release they make will have an upgraded python-xml (3.4.0-1) which should be ok.

andresriancho added a commit that referenced this issue Apr 26, 2015
@andresriancho
Copy link
Owner Author

The libxml upgrade, as seen above, considerably reduces the rate at which the memory usage increases. This indicates that there was a memory leak in lxml (since I didn't change any w3af code).

The fix applied is in the develop branch for now. It should make it to master pretty soon. At the moment develop is a little bit broken by other half-done features (#9496) but I should merge all to master soon and people will definetly benefit from the fix.

But I'm still worried, since the memory usage keeps increasing over time (at a slower pace now, but it does increase).

@andresriancho
Copy link
Owner Author

15 minutes of running with the latest lxml gives us 564mb memory usage.

@andresriancho
Copy link
Owner Author

I got lucky by changing the lxml library to the latest version. I want to see if I can do the same again with:

  • pybloomfiltermmap
  • esmre

If they don't work, replace them with their slower pure-python implementations (just for testing)

@andresriancho
Copy link
Owner Author

Updates pybloomfiltermmap to 0.3.14 and run full scan as specified by Juan:

  • 3 minute memory usage: 154 MB
  • 5 minute memory usage: 173 MB
  • 7 minute memory usage: 178 MB
  • 15 minute memory usage: 216 MB

In the 15 minute mark there is 50% memory usage! Before:

15 minutes of running with the latest lxml gives us 564mb memory usage.

Now with the latest pybloomfiltermmap: 216 MB !

Going to test again to verify.

@andresriancho
Copy link
Owner Author

Once again with the new pybloomfiltermmap: 223 MB.

Scan again with the old pybloomfiltermmap, also for 15 minutes, and I get: 228 MB.

So, it seems that my previous excitement where I was comparing with 564mb was overrated, but also my previous belief that we still had a big problem with memory leaks. Will leave the scan running to see what happens.

25 min with old pybloomfiltermmap: 257 MB
35 min: 338 MB
45 min: 385 MB

@andresriancho
Copy link
Owner Author

@89berner please run some tests with the latest develop (fixed the issues it had yesterday) and let me know if your scans still reach 2gb memory usage

@andresriancho
Copy link
Owner Author

New pybloomfiltermmap has some false positives?
https://circleci.com/gh/andresriancho/w3af/1799

@89berner
Copy link

Just tested the develop branch and had the same issue after 20 minutes. Is there any additional information that I can provide after testing? Before the system becomes unusable the memory is at 99% usage and cpu goes down from 100% to 2%

Thanks!

@andresriancho
Copy link
Owner Author

Strange! Did you install the latest lxml and bloom filter mmap libs? Completely sure you're running a7cfc19 ?

@89berner
Copy link

I'm cloning develop (git clone -b develop https://github.com/andresriancho/w3af.git )

My Dockerfile is:

FROM ubuntu:12.04
MAINTAINER Juan Berner <[email protected]>

# Initial setup
# Squash errors about "Falling back to ..." during package installation

ENV TERM linux
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# Update before installing any package
RUN apt-get update -y
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y

# Install basic and GUI requirements, python-lxml because it doesn't compile correctly from pip
RUN apt-get install -y python-pip build-essential libxslt1-dev libxml2-dev libsqlite3-dev libyaml-dev openssh-server python-dev git python-lxml wget libssl-dev xdot ubuntu-artwork dmz-cursor-theme ca-certificates
RUN pip install --upgrade pip
RUN apt-get install -y libffi-dev curl

RUN pip install clamd==1.0.1 PyGithub==1.21.0 GitPython==0.3.2.RC1 pybloomfiltermmap==0.3.14 \
        esmre==0.3.1 phply==0.9.1 stopit==1.1.0 nltk==2.0.5 chardet==2.1.1 pdfminer==20140328 \
        futures==2.1.5 pyOpenSSL==0.13.1 scapy-real==2.2.0-dev guess-language==0.2 cluster==1.1.1b3 \
        msgpack-python==0.4.4 python-ntlm==1.0.1 halberd==0.2.4 darts.util.lru==0.5 \
        tblib==0.2.0 ndg-httpsclient==0.3.3 pyasn1==0.1.7 lxml==3.4.4

RUN pip install nltk==3.0.1 pyasn1==0.1.3 Jinja2==2.7.3 vulndb==0.0.17 markdown==2.6.1

EXPOSE 22

RUN cd /opt/ && git clone -b develop https://github.com/andresriancho/w3af.git && cd /opt/w3af/
#RUN echo "Y" | /opt/w3af/w3af_console

ADD ./start.sh /opt/start.sh 
RUN chmod 777 /opt/start.sh && mkdir -p /var/run/sshd && chmod 0755 /var/run/sshd
CMD ["/usr/sbin/sshd", "-D"]

@andresriancho
Copy link
Owner Author

Well, I'll have to investigate further then. I'm running my tests from my home workstation, which does have a slow connection (compared to an EC2 server). What might happen is that by running this on EC2 you get more HTTP request/responses in the same timeframe and thus you're able to reproduce the issue much faster (set max_discovery_time 5 might be affecting the tests)

@andresriancho
Copy link
Owner Author

Sadly I won't be able to help much during this week since I started a new engagement, so you'll have to wait (or fix it yourself 👍 )

@andresriancho
Copy link
Owner Author

Related work being done at https://github.com/andresriancho/collector/tree/master/examples/w3af , this will allow me to quickly test w3af's performance

@andresriancho
Copy link
Owner Author

Experiment with slots (even if it's just for fun) for URL objects
https://docs.python.org/2/reference/datamodel.html#slots

This was referenced May 13, 2015
@andresriancho
Copy link
Owner Author

Solved in develop branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants