-
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-38692] Add extra logging to help diagnosing Thread spike. #116
Conversation
Add extra logging statements to try and get to the root cause of the IOHub thread spike.
pending running through the stress test benchmark to check for performance regressions. |
Stress test results |
@@ -292,6 +292,10 @@ public final void removeInterestConnect(SelectionKey key) { | |||
* @param key the key. | |||
*/ | |||
public final void addInterestRead(SelectionKey key) { | |||
if (LOGGER.isLoggable(Level.FINEST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you were going to pull this out to a class level constant initialized on classloading so that hotspot can reduce to no-op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't have added the if here. It's adding verbosity for no reason in that case. I suppose you added that because of the TODO below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenc so the performance without inlining didn't seem to bad - and if we leave it as is then you can use the standard Jenkins UI (or JMX for agents) to change the logging level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you added that because of the TODO below...
yes - easier to do it now and know the performance hit - also prevents someone forgetting to do it later.
Single measurement results
CPU load before: 1.53 Avg Memory before: 195592±43263 No significant difference observed... I'll need to collect a few more runs to get some measure on CPU load effects |
I think this PR is fine. More diagnostics rarely hurts, and additional condition check should not cause a significant performance degradation. So 👍 form me. BTW, it's going into remoting 3.1+ IMHO |
👍 |
Retriggering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐝
@@ -292,6 +292,10 @@ public final void removeInterestConnect(SelectionKey key) { | |||
* @param key the key. | |||
*/ | |||
public final void addInterestRead(SelectionKey key) { | |||
if (LOGGER.isLoggable(Level.FINEST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't have added the if here. It's adding verbosity for no reason in that case. I suppose you added that because of the TODO below...
@@ -480,7 +496,11 @@ public final void run() { | |||
* Process the scheduled tasks list. | |||
*/ | |||
private void processScheduledTasks() { | |||
if (scheduledTasks.size() > 4) { | |||
final int tasksWaiting = scheduledTasks.size(); | |||
if (LOGGER.isLoggable(Level.FINEST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test saves auto-boxing and GC churn
@batmat those |
@stephenc my comment wasn't blindly targeting all ifs. I'm aware of what you say. The ones I commented on weren't constructing objects. Most here actually currently only pass references and don't construct anything. |
Still having to set up the stack frame to pass through... the testing I did on a few of these showed a negative impact where the if was removed... there was negligible difference vs where I moved the |
@reviewbybees done - @oleg-nenashev over to you. |
is it still WiP? I doubt |
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements. Bugfixes: * [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) - Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`. It was causing connection failures of Jenkins JNLP agents when using Java Web Start. ([PR jenkinsci#131](jenkinsci/remoting#131)) * [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) - `hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed. ([PR jenkinsci#131](jenkinsci/remoting#131)) Improvements: * [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) - Add logic for dumping diagnostics across all the channels. ([PR jenkinsci#122](jenkinsci/remoting#122), [PR jenkinsci#125](jenkinsci/remoting#125)) * [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) - Improve the caller/callee correlation diagnostics in thread dumps. ([PR jenkinsci#119](jenkinsci/remoting#119)) * [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) - Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes). ([PR jenkinsci#123](jenkinsci/remoting#123)) * [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) - Add extra logging to help diagnosing `IOHub` Thread spikes. ([PR #116](jenkinsci/remoting#116)) * [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) - When a proxy fails, report what caused the channel to go down. ([PR jenkinsci#128](jenkinsci/remoting#128))
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements. Bugfixes: * [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) - Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`. It was causing connection failures of Jenkins JNLP agents when using Java Web Start. ([PR #131](jenkinsci/remoting#131)) * [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) - `hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed. ([PR #131](jenkinsci/remoting#131)) Improvements: * [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) - Add logic for dumping diagnostics across all the channels. ([PR #122](jenkinsci/remoting#122), [PR #125](jenkinsci/remoting#125)) * [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) - Improve the caller/callee correlation diagnostics in thread dumps. ([PR #119](jenkinsci/remoting#119)) * [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) - Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes). ([PR #123](jenkinsci/remoting#123)) * [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) - Add extra logging to help diagnosing `IOHub` Thread spikes. ([PR #116](jenkinsci/remoting#116)) * [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) - When a proxy fails, report what caused the channel to go down. ([PR #128](jenkinsci/remoting#128))
JENKINS-38692 Add extra logging statements to try and get to the root cause
of the IOHub thread spike.
@reviewbybees esp @stephenc