Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-14816] Add thread dump feature for taskmanager #11887

Closed
wants to merge 5 commits into from

Conversation

tillrohrmann
Copy link
Contributor

Extension of #10228.

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 04664a7 (Thu Apr 23 16:26:27 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 23, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@lamberken
Copy link
Member

lamberken commented Apr 23, 2020

hi @tillrohrmann, thanks for doing this, 💯 , but RestAPIStabilityTest run failed.

Worked fine.
image

@tillrohrmann
Copy link
Contributor Author

Thanks for the pointer @lamber-ken. I will update the PR to fix the RestAPIStabilityTest.

tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Apr 24, 2020
Instead of serving plain text, this commit changes the TaskManagerThreadDumpHandler
to return a JSON response. This allows to further extend this handler to not
only generate a thread dump of all threads but also for a sub set. Morever,
it allows to return a more structured return value in the future.

This closes apache#11887.
@tillrohrmann tillrohrmann marked this pull request as ready for review April 24, 2020 08:19
@tillrohrmann tillrohrmann changed the title [DRAFT][FLINK-14816] Add thread dump feature for taskmanager [FLINK-14816] Add thread dump feature for taskmanager Apr 24, 2020
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Apr 24, 2020
Instead of serving plain text, this commit changes the TaskManagerThreadDumpHandler
to return a JSON response. This allows to further extend this handler to not
only generate a thread dump of all threads but also for a sub set. Morever,
it allows to return a more structured return value in the future.

This closes apache#11887.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Apr 24, 2020
Instead of serving plain text, this commit changes the TaskManagerThreadDumpHandler
to return a JSON response. This allows to further extend this handler to not
only generate a thread dump of all threads but also for a sub set. Morever,
it allows to return a more structured return value in the future.

This closes apache#11887.
@tillrohrmann
Copy link
Contributor Author

cc @zentol: The TaskManagerThreadDumpHandler now returns a ThreadDumpInfo which contains an array of ThreadDumpInfo.ThreadInfo. ThreadDumpInfo.ThreadInfo contains the name of the thread and the stringified thread info. This is kind of the compromise I found wrt our offline discussion.

tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Apr 24, 2020
Instead of serving plain text, this commit changes the TaskManagerThreadDumpHandler
to return a JSON response. This allows to further extend this handler to not
only generate a thread dump of all threads but also for a sub set. Morever,
it allows to return a more structured return value in the future.

This closes apache#11887.
Comment on lines 1034 to 1046
private CompletableFuture<TransientBlobKey> putTransientBlobStream(InputStream inputStream, String fileTag) {
final TransientBlobCache transientBlobService = blobCacheService.getTransientBlobService();
final TransientBlobKey transientBlobKey;

try {
transientBlobKey = transientBlobService.putTransient(inputStream);
} catch (IOException e) {
log.debug("Could not upload file {}.", fileTag, e);
return FutureUtils.completedExceptionally(new FlinkException("Could not upload file " + fileTag + '.', e));
}
return CompletableFuture.completedFuture(transientBlobKey);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tillrohrmann, since doesn't make thread dump as stream in your design right now, we can revert it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is still good since it breaks up a longer method into smaller units.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They still belong into a separate commit though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is still good since it breaks up a longer method into smaller units.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes but I think it is ok to not twiddle with this commit because the benefit would be marginal.

@zentol zentol self-assigned this Apr 27, 2020
@@ -22,6 +22,13 @@
* Different file types to request from the {@link TaskExecutor}.
*/
public enum FileType {
/**
* the log file type for taskmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat. But I believe that the extra effort splitting this change into a separate commit won't pay off. Hence, I would like to keep it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the code-style guide:

Pull Requests must put cleanup, refactoring, and core changes into separate commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right Chesnay. I will refactor the changes into separate commits in order to follow the code style guide.

final Collection<ThreadInfo> threadDump = JvmUtils.createThreadDump();

final Collection<ThreadDumpInfo.ThreadInfo> threadInfos = threadDump.stream()
.map(threadInfo -> ThreadDumpInfo.ThreadInfo.create(threadInfo.getThreadName(), threadInfo.toString()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this conversion should really be done on the TaskExecutor.
I'd prefer if conversions to REST responses are done by the handler just to separate things a bit.

Copy link
Contributor Author

@tillrohrmann tillrohrmann Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as the Rest response and the internal cluster message differ I will separate the classes. As long as they are identical I think it is fine to use the same class here.

public static final String FIELD_NAME_THREAD_INFOS = "threadInfos";

@JsonProperty(FIELD_NAME_THREAD_INFOS)
private final Collection<ThreadInfo> threadInfos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a map thread_name -> thread_info? I think this would be more generally usable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason was the duplicated thread name information as a key and as part of the ThreadInfo class. But I guess this is premature optimization. I will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think whether it is stored in a map or not is an implementation detail. Moreover, we don't offer a method getThreadInfo(String threadName) yet. Hence, there is actually no reason to store them internally as a map. Once the requirement changes, I'm happy to adapt the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't talking about changing the internal storage, but to expose it as a map instead (i.e., changing the JOSN representation)

Assuming a user is interested in a particular thread, then with a map one can do threadDump['myThreadName'], instead of having to iterate over the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will take another look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having looked at it again, I still believe that the response should be a collection of ThreadInfos. If the user needs to do a named lookup, then she can still create a map from the collection. It should not be the responsibility response class to present the elements in a certain order. Moreover, we don't have this requirement at the moment. Hence, I won't do the proposed change.

Comment on lines 1034 to 1046
private CompletableFuture<TransientBlobKey> putTransientBlobStream(InputStream inputStream, String fileTag) {
final TransientBlobCache transientBlobService = blobCacheService.getTransientBlobService();
final TransientBlobKey transientBlobKey;

try {
transientBlobKey = transientBlobService.putTransient(inputStream);
} catch (IOException e) {
log.debug("Could not upload file {}.", fileTag, e);
return FutureUtils.completedExceptionally(new FlinkException("Could not upload file " + fileTag + '.', e));
}
return CompletableFuture.completedFuture(transientBlobKey);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They still belong into a separate commit though.

@tillrohrmann
Copy link
Contributor Author

Thanks for the review @zentol. I answered all your comments. Please take another look.

tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Apr 27, 2020
Instead of serving plain text, this commit changes the TaskManagerThreadDumpHandler
to return a JSON response. This allows to further extend this handler to not
only generate a thread dump of all threads but also for a sub set. Morever,
it allows to return a more structured return value in the future.

This closes apache#11887.
lamberken and others added 5 commits April 28, 2020 12:10
Instead of serving plain text, this commit changes the TaskManagerThreadDumpHandler
to return a JSON response. This allows to further extend this handler to not
only generate a thread dump of all threads but also for a sub set. Morever,
it allows to return a more structured return value in the future.

This closes apache#11887.
@tillrohrmann
Copy link
Contributor Author

Thanks for the review @lamber-ken.

@zentol I've split the unrelated changes into separate commits. I won't change the JSON format though. I will merge this PR once CI gives green light.

@tillrohrmann tillrohrmann deleted the FLINK-14816 branch May 1, 2020 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants