Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-39543] Improve the caller/callee correlation diagnostics #119

Merged
merged 1 commit into from
Oct 18, 2016

Conversation

kohsuke
Copy link
Member

@kohsuke kohsuke commented Oct 18, 2016

I was trying to diagnose a remoting call that bridges two VMs that is hanging. This can be made easier if thread dump allows us to correlate the caller/callee.

This change adds the request ID to thread name so that thread dump will show this clearly

Also fixes one incorrect use of toString() in diagnosis, which should be getName() --- see all the other uses of thread names.

@reviewbybees

I was trying to diagnose a remoting call that bridges two VMs that is
hanging. This can be made easier if thread dump allows us to correlate
the caller/callee.

This change adds the request ID to thread name so that thread dump will
show this clearly

Also fixes one incorrect use of toString() in diagnosis, which should
be getName() --- see all the other uses of thread names.
@recena
Copy link

recena commented Oct 18, 2016

LGTM, 🐝

@@ -233,6 +233,8 @@ public boolean isDone() {

public RSP get() throws InterruptedException, ExecutionException {
synchronized(Request.this) {
String oldThreadName = Thread.currentThread().getName();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe oldThreadName is a bit confuse when you are asking for currentThread()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same variable name used everywhere. I think it's old (thread name) not (old thread) name

@stephenc
Copy link
Member

🐝

@ghost
Copy link

ghost commented Oct 18, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@kohsuke kohsuke merged commit 0c7df25 into stable-2.x Oct 18, 2016
@kohsuke kohsuke deleted the call-correlation branch October 18, 2016 18:46
@oleg-nenashev
Copy link
Member

Post-:bee:

@oleg-nenashev
Copy link
Member

I will add the post-factum JIRA issue since this stabilization branch is a subject for careful reviews

@oleg-nenashev
Copy link
Member

@oleg-nenashev oleg-nenashev changed the title Improve the caller/callee correlation diagnostics [JENKINS-39543] Improve the caller/callee correlation diagnostics Nov 6, 2016
oleg-nenashev added a commit that referenced this pull request Nov 6, 2016
oleg-nenashev added a commit to oleg-nenashev/jenkins that referenced this pull request Nov 9, 2016
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements.

Bugfixes:

* [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) -
Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`.
It was causing connection failures of Jenkins JNLP agents when using Java Web Start.
([PR jenkinsci#131](jenkinsci/remoting#131))
* [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) -
`hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed.
([PR jenkinsci#131](jenkinsci/remoting#131))

Improvements:

* [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) -
Add logic for dumping diagnostics across all the channels.
([PR jenkinsci#122](jenkinsci/remoting#122), [PR jenkinsci#125](jenkinsci/remoting#125))
* [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) -
Improve the caller/callee correlation diagnostics in thread dumps.
([PR jenkinsci#119](jenkinsci/remoting#119))
* [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) -
Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes).
([PR jenkinsci#123](jenkinsci/remoting#123))
* [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) -
Add extra logging to help diagnosing `IOHub` Thread spikes.
([PR #116](jenkinsci/remoting#116))
* [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) -
 When a proxy fails, report what caused the channel to go down.
([PR jenkinsci#128](jenkinsci/remoting#128))
oleg-nenashev added a commit to jenkinsci/jenkins that referenced this pull request Nov 10, 2016
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements.

Bugfixes:

* [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) -
Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`.
It was causing connection failures of Jenkins JNLP agents when using Java Web Start.
([PR #131](jenkinsci/remoting#131))
* [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) -
`hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed.
([PR #131](jenkinsci/remoting#131))

Improvements:

* [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) -
Add logic for dumping diagnostics across all the channels.
([PR #122](jenkinsci/remoting#122), [PR #125](jenkinsci/remoting#125))
* [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) -
Improve the caller/callee correlation diagnostics in thread dumps.
([PR #119](jenkinsci/remoting#119))
* [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) -
Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes).
([PR #123](jenkinsci/remoting#123))
* [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) -
Add extra logging to help diagnosing `IOHub` Thread spikes.
([PR #116](jenkinsci/remoting#116))
* [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) -
 When a proxy fails, report what caused the channel to go down.
([PR #128](jenkinsci/remoting#128))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants