-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Python profiling of Server requests #3502
Add support for Python profiling of Server requests #3502
Conversation
lib/pbench/profiling.conf.py
Outdated
worker object (ain't Python grand!?), and we use the worker's log.) | ||
""" | ||
worker.profile = cProfile.Profile() | ||
worker.log.info(f"PROFILING {worker.pid}: {req.uri}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally bothered by the "single API" profiling, since it relies on "stopping the ride" at just the right time. A more useful alternative might be to provide a way to specify a full or base URI to profile, e.g., /api/v1/upload/.*
and only profile on a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll grant that it's not a fully general solution (but it was sufficiently useful to my needs). Using an environment variable to control it means restarting the server to enable/disable the profiling, which, in my mind, relegates it to use in a controlled/test environment...where "stopping the ride" is really not a problem.
We can imagine lots of ways to improve the controls over it, but I would defer them until we have an actual need.
lib/pbench/profiling.conf.py
Outdated
s = StringIO() | ||
ps = pstats.Stats(worker.profile, stream=s).sort_stats("cumulative") | ||
ps.print_stats(30) # Limit the number of lines output | ||
worker.log.info(f"[{worker.pid}] [INFO] {s.getvalue()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the limitations in format, this gives the advantage of profiling data for every call in an accessible place. The [INFO]
, by the way, seems unnecessary bulk. Hopefully pstats.Stats
identifies the URI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur that there is value to being able to put it in the journal-log, which is why I've retained that as a choice. I just don't see the value of having it there and in the external file...so, as the code stands, it's one or the other. Also, the output in the log is only the top 30 functions in terms of the cumulative time attributed to them including their subroutine calls; unfortunately, the "tail" is pretty long (between 980 and 1646), so 30 functions doesn't capture much of it...but posting more than 30 will make a mess of the log...which makes the external file much more attractive.
I've revised the logging to remove much of the unnecessary bulk. It was there in the code that I was cribbing from, so I kept it. (That output format seems to be consistent with other Gunicorn messages, but I concur that it doesn't serve this purpose well, so I've removed it.)
pstats.Stats
does not identify the URI, but the log message in one of the previous lines does, so the context is available in the log. (I considered adding it to this message, but I don't think it actually accomplishes anything over having it in an adjacent message.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now "ready for review".
lib/pbench/profiling.conf.py
Outdated
worker object (ain't Python grand!?), and we use the worker's log.) | ||
""" | ||
worker.profile = cProfile.Profile() | ||
worker.log.info(f"PROFILING {worker.pid}: {req.uri}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll grant that it's not a fully general solution (but it was sufficiently useful to my needs). Using an environment variable to control it means restarting the server to enable/disable the profiling, which, in my mind, relegates it to use in a controlled/test environment...where "stopping the ride" is really not a problem.
We can imagine lots of ways to improve the controls over it, but I would defer them until we have an actual need.
lib/pbench/profiling.conf.py
Outdated
s = StringIO() | ||
ps = pstats.Stats(worker.profile, stream=s).sort_stats("cumulative") | ||
ps.print_stats(30) # Limit the number of lines output | ||
worker.log.info(f"[{worker.pid}] [INFO] {s.getvalue()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur that there is value to being able to put it in the journal-log, which is why I've retained that as a choice. I just don't see the value of having it there and in the external file...so, as the code stands, it's one or the other. Also, the output in the log is only the top 30 functions in terms of the cumulative time attributed to them including their subroutine calls; unfortunately, the "tail" is pretty long (between 980 and 1646), so 30 functions doesn't capture much of it...but posting more than 30 will make a mess of the log...which makes the external file much more attractive.
I've revised the logging to remove much of the unnecessary bulk. It was there in the code that I was cribbing from, so I kept it. (That output format seems to be consistent with other Gunicorn messages, but I concur that it doesn't serve this purpose well, so I've removed it.)
pstats.Stats
does not identify the URI, but the log message in one of the previous lines does, so the context is available in the log. (I considered adding it to this message, but I don't think it actually accomplishes anything over having it in an adjacent message.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication of log output is a concern. (Are you really generating two dump files, also, and just not noticing?) But since this is something that needs to be explicitly enabled with a server restart, and despite the significant shortcomings of the current approach provides at worst a viable template for valuable information going forward, I'm OK with just pushing it through and polishing the floors later when we get around to it... 😁
At this point, I would call it a "mystery" rather than a "concern" (and, I suspect that it is actually a feature of the Python logger which we just weren't expecting to see show up like that). The two entries in the log have different prefixes and different formatting (i.e., one of them has all the text (including newlines) in a single log entry while the other logs the same text with each line in a separate log message). This seems to me like they both come from the same call in the source code but go through different handlers in the logger stack whose output ends up redirected (unexpectedly, perhaps) to the same place.
That is an excellent question...I don't know how we could tell.... 😆 |
This PR adds support for performing execution profiling of Pbench Server API requests.
The profiling is performed by the Python Standard Library
cProfile
package. Profiling is controlled via Gunicorn pre- and post-request hook functions, which are installed via a Gunicorn configuration file (which is actually Python code...pretty cool).How to use it: if the
PB_PROFILE_DUMP_FILE
environment variable is defined whenshell.py
executes, it adds the configuration file to the Gunicorn command line, and everything else is handled automagically.Profiling data is collected only during a single API request, starting when that request is invoked and ending when the request is complete. If the environment variable is defined to a file path, then the profile data is dumped to that file when the request completes. The data from each request overwrites the data from the previous request.
I recommend defining
PB_PROFILE_DUMP_FILE
to/srv/pbench/public_html/server_profile.prof
: this results in the profile data being placed where it can easily be fetched by pointing a browser athttps://<server_address>/server_profile.prof
. Once downloaded, the profile data can be nicely visualized using SnakeViz.Note that if
PB_PROFILE_DUMP_FILE
is defined instead to an empty string, the top time-consuming functions are dumped to the log, instead. Unfortunately, the list appears twice in the log -- I don't know why -- but, since it's so easy to use the file dump and so much more effective to use SnakeViz's "icicle graph", I'm not overly concerned about it.There is no turn-key method at present for setting
PB_PROFILE_DUMP_FILE
in the canned server (the easiest way seems to be editing theserver/lib/systemd/pbench-server.service
file before the container is built or editing/etc/systemd/system/pbench-server.service
inside the container when it's running). But, this support was quite valuable in tracking down the issues with upload timeouts, so I figured I would just capture it in its current state, which is quite usable once it is enabled.PBENCH-1228