Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the resource-utilization calc: average over the last N ms #127

Merged

Conversation

jfinken
Copy link
Contributor

@jfinken jfinken commented Oct 17, 2023

Overview

Previously, resource-utilization (RU) was calculated as an average since program-start. This PR changes the calculation to average over the last period milliseconds, where period is the sampling flag, e.g. ---sampling 1000.

This change brings RU more inline with programs like htop and is the idea, and request, of @mauropasse

  • Note the question below. Should this change be a flag exposed by irobot_benchmark? E.g. exposing the ability to choose the desired behavior?
  • Also note the change to the default resource sampling period. That isn't necessarily required.

Tests

Tests were conducted within a rolling docker container

  • white-mountain topology on x86
  • 60sec test duration
  • sampling period of 1000ms
  • note for example the percent-CPU-utilization differences:

this branch:

time[ms]       cpu[%]    arena[KB]      in_use[KB]     mmap[KB]       rss[KB]        vsz[KB]
0              0         0              0              0              0              0      
1000           2.60      45296          39335          129552         120092         3588216
2000           0.40      45428          39345          129552         120092         3588216
3000           0.25      45428          39349          129552         120092         3588216
4000           0.19      45428          39336          129552         120092         3588164
5000           0.16      45428          39352          129552         120092         3588216
6000           0.13      45428          39355          129552         120092         3588216
7000           0.12      45428          39339          129552         120092         3588164
8000           0.27      45428          39339          129552         120092         3588164
9000           0.23      45428          39357          129552         120092         3588216
10000          0.23      45428          39341          129552         120092         3588164
...
59000          0.013     45428          39342          129552         120092         3588164
60000          0.035     45428          39342          129552         120092         3588164
61000          0.0092    45560          39218          129552         121960         3522564

the target master branch:

time[ms]       cpu[%]    arena[KB]      in_use[KB]     mmap[KB]       rss[KB]        vsz[KB]
0              0         0              0              0              0              0
1000           2.5       45236          39333          129552         119924         3588148
2000           1.7       45368          39343          129552         119924         3588148
3000           1.6       45368          39348          129552         119924         3588148
4000           1.4       45368          39334          129552         119924         3588096
5000           1.3       45368          39352          129552         119924         3588148
6000           1.2       45368          39353          129552         119924         3588148
7000           1.3       45368          39337          129552         119924         3588096
8000           1.3       45368          39339          129552         119924         3588096
9000           1.2       45368          39354          129552         119924         3588148
10000          1.2       45368          39338          129552         119924         3588096
...
59000          1.2       45368          39340          129552         119924         3588096
60000          1.2       45368          39338          129552         119924         3588096
61000          1.2       45368          39214          129552         122084         3456960
  • the avg cpu[%] falls off much more quickly
  • and the final numbers are not impacted by the initial spike (during object construction and initialization, etc)
  • although with this change one may want to look at the full set of numbers in resources.txt vs. simply the last line or two...

…lliseconds.

Previously, RU was calculated since program-start.  This commit changes
this calculation to average over the last N milliseconds.
@@ -49,7 +49,9 @@ void ResourceUsageLogger::start(std::chrono::milliseconds period)
std::cout << "[ResourceUsageLogger]: Logging to " << m_filename << std::endl;

m_t1_real_start = std::chrono::steady_clock::now();
m_t1_user = std::clock();
// Question: make this a flag? That is, calc RU since program start or over
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a situation in which we need to use the previous behavior, we could add a flag.
I don't foresee an use case for this though, and wrt some kind of "backward compatibility", we always looked into the last values of the CPU, right before the program finishes (which now we could get more quickly, after maybe just 5 seconds instead of 60).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll remove the question/comment. Thanks.

@mauropasse
Copy link
Collaborator

I'm going to give it a try, but looks good!

@mauropasse
Copy link
Collaborator

I just tried it, and it didn't seem to work well
image
TOP reported ~23% CPU, while the resources.txt:

root@DDB-379b20:/home/logger/logs/ros2-application-dart-2023_10_18/irobot_benchmark
# cat white_mountain_log/resources.txt 
time[ms]       cpu[%]    arena[KB]      in_use[KB]     mmap[KB]       rss[KB]        vsz[KB]        
0              0         0              0              0              0              0              
500            25        6176           5860           768            24120          124012         
1000           14        16116          15718          20700          33896          153752         
1500           9.9       23268          22295          20700          39984          492312         
2000           1.3       23268          22306          20700          39992          492312         
2500           1.2       23268          22313          20700          39992          492312         
3000           1.1       23268          22317          20700          39992          492312         
3500           0.88      23268          22317          20700          39992          492312         
4000           0.82      23268          22318          20700          39992          492312    

Like, just the 1st CPU line is correct but then goes very low

@jfinken
Copy link
Contributor Author

jfinken commented Oct 18, 2023

I just tried it, and it didn't seem to work well image TOP reported ~23% CPU, while the resources.txt:

Like, just the 1st CPU line is correct but then goes very low

Yep it isn't correct. Looking again, that denominator continues to increase, which is why the result gets small very quickly. Working on it...

// resource utilization only over the last `period` milliseconds, *not*
// since program start.
m_t1_user = std::clock();
m_t1_real = std::chrono::steady_clock::now();
Copy link
Contributor Author

@jfinken jfinken Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mauropasse

Ok note here how both of these members are updated right before sleep_until. Now the denominator, time_elapsed_real_ms, should reflect the period duration here.

Note that I don't think you can compare the actual value from top to what you see in the resources.txt file. As I understand it, top expresses %CPU as a percentage of total CPU time. However, here we are expressing our cpu% as the approximate CPU time used by our process over monotonic wall-clock time. e.g.

// cpu time of our process   /  monotonic wall clock time, using these
std::clock()                 / std::chrono::steady_clock::now();

I think top and our calculation are somewhat different, but I could certainly be wrong.

Copy link
Collaborator

@mauropasse mauropasse Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, top and our calculation are a bit different. If you multiply the CPU% we get in our way, by the number of cores of your system, the values are quite similar.
I mostly refered to the top approach of averaging every 1 or 2 seconds, so it doesn't reflect the past before that.
For that, the CPU load is what helps. I don't think we care about that so our approach is fine!

@mauropasse
Copy link
Collaborator

mauropasse commented Oct 19, 2023

Nice, with your latest commit we see the correct values. Now CPU converges right away to the steady CPU usage.
Here's a plot (left) of the difference between the new and previous CPU computing approach (sampling every 500ms):
image
After 30 seconds, the original CPU computation didn't get to the steady value.
The image in the right is using 10ms sampling. We can see now a fine level of detail of what's going up with the CPU usage when the program starts, looks cool!

@mauropasse
Copy link
Collaborator

Still playing with the sampling rate, using 1ms we can get nice memory plots, allowing to see how memory scales over time when we create the system, and memory steps when shared libs are loaded, etc.
In this plot, RSS of white_mountain topology:
image
This could be useful to compare memory used by the different ROS2 distros, when the system is the same.

@jfinken
Copy link
Contributor Author

jfinken commented Oct 23, 2023

@mauropasse mauropasse merged commit f1e7eba into irobot-ros:master Oct 30, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants