Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeline visualization tool #1022

Merged
merged 6 commits into from
Jan 9, 2024
Merged

Timeline visualization tool #1022

merged 6 commits into from
Jan 9, 2024

Conversation

wks
Copy link
Collaborator

@wks wks commented Nov 13, 2023

This PR adds eBPF-based scripts for recording the start and end time of work packets and formatting the log for visualization with PerfettoUI.

Co-authored-by: Claire Huang [email protected]
Co-authored-by: Zixian Cai [email protected]

@wks wks marked this pull request as ready for review January 8, 2024 08:52
@wks wks requested review from caizixian and qinsoon January 8, 2024 08:52
@wks
Copy link
Collaborator Author

wks commented Jan 8, 2024

There is a known issue: The scripts sometimes fail to capture the work packet names when running the OpenJDK binding. The bpftrace script reads all zero bytes from the memory location that holds the type names. However, if the value of the packet names are read by the Rust program (not the bpftrace script), the bpftrace script will be able to see those packet name strings. From the Rust documentation, the current implementation of std::any::typename<T>() reads the type name from debug information. I think it may be related to the exact timing when such type info is mmap-ed into the address space of the process. If it is mmap-ed too late, bpftrace will not be able to see it.

@caizixian
Copy link
Member

@clairexhuang might also be interested

@caizixian
Copy link
Member

caizixian commented Jan 8, 2024

There is a known issue: The scripts sometimes fail to capture the work packet names when running the OpenJDK binding. The bpftrace script reads all zero bytes from the memory location that holds the type names. However, if the value of the packet names are read by the Rust program (not the bpftrace script), the bpftrace script will be able to see those packet name strings. From the Rust documentation, the current implementation of std::any::typename<T>() reads the type name from debug information. I think it may be related to the exact timing when such type info is mmap-ed into the address space of the process. If it is mmap-ed too late, bpftrace will not be able to see it.

when such type info is mmap-ed into the address space of the process

Yes, it could be a race condition of on-demand paging vs the bpf program reading that address.

I remember you had a workaround that in probe!, you read the string to force the mapping. Does the workaround not work? And is this why you have @gc_count >= 2 to make sure that the typenames are mapped (which wouldn't be a reliable workaround, because for example, nursery GC might have different kinds of workpackets than full GC)?

Two more potential workarounds:

  • Find out the section where the static strings live, and pretouch the section during boot time.
  • In bpftrace code, check whether the str is empty, if so, don't set @decoded, and we can try again next time the packet is executed. This is in the slow path so shouldn't add much overheads.

I also suggest that when we merge, we add Co-authored-by: since this originally came from Claire's MPLR paper.

@caizixian
Copy link
Member

Also should we considering adding a mode where we visualize the relationship of work packets as well (which packet spawns what)? If collecting the work packet ID has too much overhead, we can always make this optionally compiled.

I remember @wks spent a lot of time getting that to work, and figuring out the correct JSON output for the arrows. It would be a shame if we lose it.

@caizixian
Copy link
Member

Overall the cleanup of the scripts look good.

@wks
Copy link
Collaborator Author

wks commented Jan 9, 2024

There is a known issue: The scripts sometimes fail to capture the work packet names when running the OpenJDK binding. The bpftrace script reads all zero bytes from the memory location that holds the type names. However, if the value of the packet names are read by the Rust program (not the bpftrace script), the bpftrace script will be able to see those packet name strings. From the Rust documentation, the current implementation of std::any::typename<T>() reads the type name from debug information. I think it may be related to the exact timing when such type info is mmap-ed into the address space of the process. If it is mmap-ed too late, bpftrace will not be able to see it.

when such type info is mmap-ed into the address space of the process

Yes, it could be a race condition of on-demand mapping vs the bpf program reading that address.

I remember you had a workaround that in probe!, you read the string to force the mapping. Does the workaround not work?

It works, but it adds an unnecessary overhead to mmtk-core. I added this workaround and guarded it under a Cargo feature, just in case somebody needs that workaround.

And is this why you have @gc_count >= 2 to make sure that the typenames are mapped (which wouldn't be a reliable workaround, because for example, nursery GC might have different kinds of workpackets than full GC)?

Yes. That was my intention, but that didn't work. And I forgot to clean it up. I'll remove the expression @gc_count >= 2.

Two more potential workarounds:

* Find out the section where the static strings live, and pretouch the section during boot time.

Here is a log:

@type_name[140272533677491]: mmtk::scheduler::gc_work::PrepareCollector
@type_name[140272533679341]: mmtk::scheduler::gc_work::VMProcessWeakRefs<mmtk::scheduler::gc
@type_name[140272533680046]: mmtk::scheduler::gc_work::ScheduleCollection
@type_name[140272533680090]: mmtk::scheduler::gc_work::ScanVMSpecificRoots<mmtk::plan::immix
@type_name[140272533682789]: mmtk::scheduler::gc_work::Prepare<mmtk::plan::immix::gc_work::I
@type_name[140272533683189]: mmtk::scheduler::gc_work::StopMutators<mmtk::plan::immix::gc_wo
@type_name[140272533689536]: mmtk::scheduler::gc_work::VMPostForwarding<mmtk_openjdk::OpenJD
@type_name[140272533691961]: mmtk::scheduler::gc_work::ReleaseMutator<mmtk_openjdk::OpenJDK<
@type_name[140272533693082]: mmtk::scheduler::gc_work::ReleaseCollector
@type_name[140272533693471]: mmtk::scheduler::gc_work::ScanMutatorRoots<mmtk::plan::immix::g
@type_name[140272533693893]: mmtk::scheduler::gc_work::VMProcessWeakRefs<mmtk::scheduler::gc
@type_name[140272533695337]: mmtk::scheduler::gc_work::Release<mmtk::plan::immix::gc_work::I
@type_name[140272533697112]: mmtk::scheduler::gc_work::Release<mmtk::plan::immix::gc_work::I
@type_name[140272533697621]: mmtk::scheduler::gc_work::ScanMutatorRoots<mmtk::plan::immix::g
@type_name[140272533699641]: mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJD
@type_name[140272533700441]: mmtk::scheduler::gc_work::StopMutators<mmtk::plan::immix::gc_wo
@type_name[140272533701952]: mmtk::scheduler::gc_work::ScanVMSpecificRoots<mmtk::plan::immix
@type_name[140272533704295]: mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJD
@type_name[140272533704985]: mmtk::scheduler::gc_work::Prepare<mmtk::plan::immix::gc_work::I
@type_name[140272533725749]: mmtk::policy::immix::immixspace::PrepareBlockState<mmtk_openjdk
@type_name[140272533725901]: mmtk::policy::immix::immixspace::SweepChunk<mmtk_openjdk::OpenJ
@type_name[140272533728339]: mmtk_openjdk::gc_work::ScanClassLoaderDataGraphRoots<mmtk_openj
@type_name[140272533743174]: mmtk_openjdk::gc_work::ScanStringTableRoots<mmtk_openjdk::OpenJ
@type_name[140272533751448]: mmtk_openjdk::gc_work::ScanStringTableRoots<mmtk_openjdk::OpenJ
@type_name[140272533766535]: 
@type_name[140272533766983]: 
@type_name[140272533767830]: 
@type_name[140272533769535]: 
@type_name[140272533769973]: 
@type_name[140272533776336]: 
@type_name[140272533794799]: 
@type_name[140272533819627]: 
@type_name[140272533836434]: mmtk_openjdk::gc_work::ScanVMThreadRoots<mmtk_openjdk::OpenJDK<
@type_name[140272533845875]: mmtk_openjdk::gc_work::ScanVMThreadRoots<mmtk_openjdk::OpenJDK<
@type_name[140272533847104]: mmtk_openjdk::gc_work::ScanUniverseRoots<mmtk_openjdk::OpenJDK<
@type_name[140272533851788]: mmtk_openjdk::gc_work::ScanCodeCacheRoots<true, mmtk::scheduler
@type_name[140272533852204]: mmtk_openjdk::gc_work::ScanObjectSynchronizerRoots<mmtk_openjdk
@type_name[140272533863388]: mmtk_openjdk::gc_work::ScanCodeCacheRoots<true, mmtk::scheduler
@type_name[140272533865153]: mmtk_openjdk::gc_work::ScanSystemDictionaryRoots<mmtk_openjdk::
@type_name[140272533867703]: mmtk_openjdk::gc_work::ScanJNIHandlesRoots<mmtk_openjdk::OpenJD
@type_name[140272533869470]: mmtk_openjdk::gc_work::ScanWeakProcessorRoots<mmtk_openjdk::Ope
@type_name[140272533873490]: mmtk_openjdk::gc_work::ScanWeakProcessorRoots<mmtk_openjdk::Ope
@type_name[140272533875252]: mmtk_openjdk::gc_work::ScanAOTLoaderRoots<mmtk_openjdk::OpenJDK
@type_name[140272533878628]: mmtk_openjdk::gc_work::ScanJvmtiExportRoots<mmtk_openjdk::OpenJ
@type_name[140272533879069]: mmtk_openjdk::gc_work::ScanJNIHandlesRoots<mmtk_openjdk::OpenJD
@type_name[140272533885083]: mmtk::util::finalizable_processor::Finalization<mmtk::scheduler
@type_name[140272533887515]: mmtk::util::finalizable_processor::Finalization<mmtk::scheduler

The names missing are all openjdk-specific packets (I compared with a log from a worked-around mmtk-core), and their addresses are part of the contiguous region that contains other strings for mmtk_openjdk::..... They span over an almost 64KB region. So I don't think the lazy mapping happens at section granularity.

* In bpftrace code, check whether the `str` is empty, if so, don't set `@decoded`, and we can try again next time the packet is executed. This is in the slow path so shouldn't add much overheads.

This workaround doesn't work. In one benchmark, some work packet names can be read for 382-398 times and still get "\0", at which time the benchmark ends.

I also suggest that when we merge, we add Co-authored-by: since this originally came from Claire's MPLR paper.

Yes. I'll do it.

@wks
Copy link
Collaborator Author

wks commented Jan 9, 2024

Also should we considering adding a mode where we visualize the relationship of work packets as well (which packet spawns what)? If collecting the work packet ID has too much overhead, we can always make this optionally compiled.

I remember @wks spent a lot of time getting that to work, and figuring out the correct JSON output for the arrows. It would be a shame if we lose it.

Yes. I implemented that. However, to achieve that, I made non-trivial changes to the scheduler and work buckets so that every work packet is assigned a unique ID when it is added to a bucket. The overhead will be significant. But I think it is still useful, even if we have to guard it behind a Cargo feature. I'll tidy up that code and make another PR for it.

@caizixian
Copy link
Member

This workaround doesn't work. In one benchmark, some work packet names can be read for 382-398 times and still get "\0", at which time the benchmark ends.

Ah ok. In that case, I agree with current workaround as of 7e9b1c0

But I think it is still useful, even if we have to guard it behind a Cargo feature. I'll tidy up that code and make another PR for it.

Sounds good.

@wks wks added this pull request to the merge queue Jan 9, 2024
Merged via the queue into mmtk:master with commit b66fa35 Jan 9, 2024
19 checks passed
@wks wks deleted the feature/timeline-tool branch January 9, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants