Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external logger outages can cause AWX to OOM #6746

Closed
ryanpetrello opened this issue Apr 17, 2020 · 2 comments
Closed

external logger outages can cause AWX to OOM #6746

ryanpetrello opened this issue Apr 17, 2020 · 2 comments

Comments

@ryanpetrello
Copy link
Contributor

ryanpetrello commented Apr 17, 2020

ISSUE TYPE
  • Bug Report
SUMMARY
  1. Turn on logging and point it at localhost:8000
  2. Don't run any TCP services on localhost:8000.
  3. Open awx-manage shell_plus and do this (and let it run for awhile):
import logging
x = 'x' * (1024 * 64)
l = logging.getLogger('awx')
while True:
    l.error(x)
EXPECTED RESULTS

AWX continues to function without rsyslogd using gobs of memory, and when localhost:8000 is properly enabled (minutes, hours) later, logs start being recorded properly.

ACTUAL RESULTS

https://www.youtube.com/watch?v=QqMVjsaGEZo

@ryanpetrello
Copy link
Contributor Author

@elyezer @rooftopcellist could you all check my work here?

@elyezer
Copy link
Member

elyezer commented Apr 20, 2020

To verify this I enabled logging to point to an offline server and hile monitoring both memory (free -mh and top -o %MEM) and disk usage (df -h), I ran the following snippet on a awx-manage shell_plus session:

import logging
x = 'x' * (1024 * 64)
l = logging.getLogger('awx')
while True:
    l.error(x)

The disk usage grew by 1G as expected and no big changes were observed on the memory usage.

After that to ensure that the right number of messages were delivered I've run a for loop to enque 10000 messages:

for i in range(10000):
    l.error(x)

And, while observing the disk usage, the following Flask app was run to receive the stored logs:

import json
from datetime import datetime
from flask import Flask, escape, request

app = Flask(__name__)

events_received = 0
started = datetime.now()

@app.route('/', methods=['POST'])
def log():
    data = json.loads(request.data)
    global events_received
    events_received += 1
    return ''

@app.route('/stats')
def stats():
    time_elapsed = (datetime.now() - started).total_seconds()
    return f'Received {events_received} in {time_elapsed} seconds\n'

At the end the disk space used was returned to the system and the right number of events received was shown by the stats information.

With all that we can consider this issue as verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants