Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu blocked for ~10s before first request on complex Flask app with gevent on 8.10.1 release #918

Closed
ddorian opened this issue Sep 2, 2023 · 14 comments · Fixed by #970
Closed
Assignees

Comments

@ddorian
Copy link

ddorian commented Sep 2, 2023

Hi,

I have a pretty complex flask app locally that hangs for about ~10s ONLY before the first request when using 8.10.1 release (doesn't happen on 8.10.0). Still happens on 9.0.0 though. The CPU usage is 100% for those seconds. All the other requests work normally after.

How do you want me to profile it? Using ubuntu 22.04 & Python 3.11 and many libraries.

@ddorian
Copy link
Author

ddorian commented Sep 2, 2023

The first request takes 0.6s on 8.10.0, while it takes ~8.8s on 8.10.1. (makes PostgreSQL connections, compiles jinja2 etc)

The subsequent requests take 10ms.

I tried with Pycharm profiler but it only showed "gevent-wait" time, and couldn't find it.

mytest.pstat.zip

Tried with flask-debugtoolbar but it only profiles inside the request (it's ~100ms for both versions).

So, even on 8.10.0, the first request takes 100ms on flask-side + 500ms on new-relic side.

Which should also be unacceptable IMHO. (see screenshot)
image

I'm running newrelic with manually like this:

import gevent.monkey

# had problems monkey-patching inside uwsgi so doing it here
gevent.monkey.patch_all()

import os
import sys
from pathlib import Path

# start newrelic manually, so we get no errors from gevent
# https://discuss.newrelic.com/t/found-problems-running-newrelic-admin-with-python-3-6-gevent-and-gunicorn/70583/2

newrelic_path = Path("deploy/files/newrelic.ini")
environment = os.environ.get("FLASK_ENV")
if newrelic_path.exists():
    import newrelic.agent

    print("starting newrelic!")
    newrelic.agent.initialize(newrelic_path.as_posix(), environment=environment)

@mthipparthi
Copy link

mthipparthi commented Sep 4, 2023

We have the same issue. It has killed all our applications and performance. High time it is resolved. BTW we are customer.

@TimPansino
Copy link
Contributor

Can we get some more info on this issue? A pip freeze would be very helpful if you're comfortable providing it. Or at the very least gevent and greenlet versions used.

This may be related to setting startup_timeout. We do not recommend setting startup_timeout for web applications, as this blocks completion of requests to the application for the initial request(s) until registration with the backend can complete (up to the time set, in seconds). Not setting it or setting to 0 means that initial requests will not be monitored while the agent starts on a background thread. Once it completes its connect cycle, it will begin monitoring requests.

I was unable to reproduce the issue on a machine running the following app. I saw no real difference in startup time. After examining the changes between 8.10.0 and 8.10.1 the only interesting change I see is a change to prefer importlib.metadata calls over builtin version attributes on packages for environment reporting. This could have potential performance implications, but I don't see any in my initial testing. I'd like to ensure my environment is identical before I do any larger scale performance profiling sessions.

newrelic (8.10.0 vs 8.10.0)
Flask==2.3.3
gevent==23.9.1
greenlet==3.0.0rc3
# Patch with gevent
from gevent import monkey
monkey.patch_all()

# Initialize agent
import newrelic.agent
newrelic.agent.initialize("newrelic.ini")

# Create application
from gevent.pywsgi import WSGIServer
from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello():
    return "Hello from Flask!"

if __name__ == '__main__':
    # Start server
    print("Starting gevent server...")
    server = WSGIServer(("0.0.0.0", 8000), app)
    server.serve_forever()

@ddorian
Copy link
Author

ddorian commented Sep 18, 2023

@TimPansino another alternative would be to tell me how you want me to profile it (what tool etc) and I'll do it. Might be faster than doing a lot of tests locally. Example, I can't replicate it with the minimal code above.

I saw no real difference in startup time.

The delay is ONLY before the FIRST flask request.

Attached is the pip freeze.

freeze-requirements.txt

@mthipparthi
Copy link

we run using - newrelic-admin run-program gunicorn --workers 3 --timeout 65 --graceful-timeout 65 --max-requests 2000 --max-requests-jitter 200 --bind 0.0.0.0:8088 app.wsgi. same issue. Not only for gevent, even for sync workers.

@natemoser
Copy link

We're seeing a similar jump in service time, but chiefly manifesting in our interactions with Kafka. We first noticed it when going from newrelic==8.8.0 to 9.0.0, but we see the same behavior when going from 8.10.0 to 8.10.1 (8.8.0 and 8.10.0 are at parity). The service time difference can be seen in New Relic, but is confirmed by our application logs.

  • Heroku stack-22
  • python-3.10.13
  • Django==3.2.20
  • gunicorn==20.1.0
  • kafka-helper==0.2
  • kafka-python==2.0.2

This shows the drop in Kafka message message consuming transaction time when rolling back from newrelic==8.10.1 to newrelic=8.8.0

transaction_time_drop

Ironically but very helpfully, it was New Relic profiling that showed the increase in importlib.metadata usage, which pointed us toward the newrelic lib update:

v8.10.0
newrelic-8 10 0

v8.10.1
newrelic-8 10 1

@ddorian
Copy link
Author

ddorian commented Oct 18, 2023

cc @umaannamalai

@hmstepanek hmstepanek self-assigned this Nov 8, 2023
@hmstepanek
Copy link
Contributor

Hey all we have a fix coming for this to be released on Monday Nov 13th.

@hmstepanek
Copy link
Contributor

This fix was released in v9.1.2.

@ddorian
Copy link
Author

ddorian commented Nov 14, 2023

@hmstepanek this is not fixed in v9.1.2 (I just tested locally).

Can you re-open the issue? Same as in the issue body (8s+ to load the first page with 100% CPU)

@hmstepanek hmstepanek reopened this Nov 14, 2023
@hmstepanek
Copy link
Contributor

@ddorian it sounds like there may be multiple issues here. We fixed the importlib metadata issue but sounds like that wasn't the root of the problem in your case. Could you try out this branch and let us know if it reduces the cpu spike: https://github.com/newrelic/newrelic-python-agent/tree/performance-cpu-increase-possible-fix. If it doesn't reduce the CPU usage, could you send us a cprofile dump of the first transaction hitting your application? Something like the following should be sufficient for profiling: newrelic-admin run-program python -m cProfile -s tottime -o profile.dat main.py, hit the app with the first web request, kill it, and send us the profile.dat.

@ddorian
Copy link
Author

ddorian commented Nov 15, 2023

@hmstepanek just tested it, and it works! Thank you!

@hmstepanek
Copy link
Contributor

We just released a new configuration option that you can add to you newrelic.ini config file package_reporting.enabled = false or disable via an env var setting NEW_RELIC_PACKAGE_REPORTING_ENABLED=false that allows you to disable package and version capturing information on startup of the agent (this was what was causing the CPU spike). This is available in 9.2.0. Docs will be updated later tonight with the new config option. This effectively does the same thing as that branch @ddorian

@ddorian
Copy link
Author

ddorian commented Nov 17, 2023

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants