-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add warning interval option for resource monitor plugin #9826
Conversation
plugins/resource_monitor_plugin/include/eosio/resource_monitor_plugin/file_space_handler.hpp
Show resolved
Hide resolved
plugins/resource_monitor_plugin/include/eosio/resource_monitor_plugin/file_space_handler.hpp
Show resolved
Hide resolved
boost::asio::io_context ctx; | ||
static constexpr uint32_t def_monitor_warning_interval = 30; // After this number of monitor intervals, warning is output if the threshold is hit | ||
static constexpr uint32_t warning_interval_low = 1; | ||
static constexpr uint32_t warning_interval_high = 450; // e.g. if the monitor interval is 2 sec, the warning interval is at most 15 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using min and max in names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I changed all _low
and high
in the limits names to _min
and _max
.
matches = re.compile("\s+([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3})\s").search(msg) | ||
return datetime.strptime(matches.group(0).strip(), '%Y-%m-%dT%H:%M:%S.%f') | ||
|
||
intervalTolerance = 0.1 # 10% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 0.1 just random number? if yes could you run this test in a loop and dump to log curInterval - timedelta(seconds=expectedInterval)
to see how volatile it could be?
This is to avoid another flakey test where we need restart tests 2-3 times to get it passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.1 (i.e. 10%) is based on observed warning interval deviations on buildkite and my local dockers. On buildkite, the max deviation is observed on macOS 10.15, where for the first test case (10-seconds interval, from 2 seconds monitor interval and warning interval 5) the deviation is about 0.6 sec, i.e. 6%. The second test case (30 seconds) on buildkite and all my local dockers have smaller deviation percentage.
To be safer, I changed this tolerance to be 0.15 (15%) in the new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good to me but please double check with @iamveritas if we need any documentation changes related to this as we are adding new command line option.
Thank you! Let me contact @iamveritas for the documents. |
I have created a ticket to track this issue from documentation point of view. |
Change Description
Option
resource-monitor-warning-interval
is added to the resource monitor plugin to fix issue #9649. This option setsthe number of monitor intervals
after which the warning message is output if the resource threshold is hit (currently only used for filesystem space).Before this option is added, the warning is output after every monitoring (e.g. by default every 2 seconds). Now this option can be set from 1 to 450 and has a default value of 30 (e.g. if the monitor interval is 2 seconds, and this warning interval is (by default) 30, then the warning will be output every 60 seconds if the threshold is hit).
Note: the naming of this option is discussed in the issue and needs to be documented to avoid confusion.
Change Type
Select ONE:
Testing Changes
Select ANY that apply:
Consensus Changes
API Changes
Documentation Additions