-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
vta/src/tsim/virtual_memory.h
Outdated
kVirtualMemCopyToHost = 1 | ||
}; | ||
|
||
#define VTA_VMEM_PAGEFILE "/tmp/vta_tsim_vmem_pagefile.sys" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen Is there anyway to fetch corresponding file path for Windows?
@vegaluisjose @tmoreau89 I think this enables the missing |
vta/include/vta/dpi/module.h
Outdated
@@ -26,6 +26,10 @@ | |||
#include <condition_variable> | |||
#include <string> | |||
|
|||
#ifndef VTA_TSIM_USE_VIRTUAL_MEMORY | |||
#define VTA_TSIM_USE_VIRTUAL_MEMORY (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we going to leave this parameter turned on by default? what are the pros/cons to leaving this macro to be user-defined (with the idea that we're trying to minimize ifdefs in our code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Advantage
- Enable tsim to test VTA subsystems that are configured to use with 32-bit memory address (checkout @vegaluisjose 's illustration at [VTA][Chisel] Add the missing TestDefaultPynqConfig #3380)
Disadvantage
- Current virtual memory system uses virtual memory page file system to write and read address mappings, these may not be compatible to Windows.
Otherwise, I think it's safe to enable virtual memory for tsim_driver
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clear explanation. I'd opt towards leaving it on by default, at the detriment to Windows users, but others can chime in @tqchen @vegaluisjose
Hey @liangfu , Cool work, do we have to update tsim_driver and runtime? @tmoreau89 one thing I am thinking is that we should maybe have for now a
The challenge is that these addressing affects also the runtime, so I believe we could go back to 32-bit for the moment and later figure out what would be the best way for scaling it. These are still needed though because the current chisel version takes I don't know if all this make sense? |
@vegaluisjose good points, addressing the multiplicity of data and address widths can be done in a separate PR since it will require runtime changes. |
Also I would feel more comfortable if we would run the TSIM tests in the CI to ensure that things remain stable. @liangfu is that something you want to take a stab at? I'd suggest building the TSIM and sim sources together if "tsim" is enabled, and overwriting that in the If you'd rather have us do it, we can set up that infra over the next few days; I'm thinking Scala lint test would also be good as contributions to the Chisel infra increase. |
One more question with virtual memory support, does this mean that now when invoking |
Hi @vegaluisjose Regarding to your comments, I would agree with @tmoreau89 on addressing the multiplicity of data and address widths in a separate PR.
There is no need to update the runtime for switch to virtual memory for now. I just replaced @tmoreau89 Regarding to your comments,
I think these would be excellent changes to make VTA continuously grow.
I think I can handle this in a separate PR.
This is the idea I didn't think of earlier. I think this is possible at the cost of translating 32-bit virtual address back to physical address. But I don't think it worth the cost for the FPGA devices to use the virtual address system. |
Hey @liangfu and @tmoreau89, Sure, I was not expecting to solve all of that here, it was more about bringing it up so you guys are aware of these details. Keep the good work! |
Thanks Liangfu, no need to do this in a separate PR as it is currently being worked on in #3704 and a follow-up PR! |
@tqchen let us know if there are more changes to be made to this PR |
Sorry i did not take a close look at the implementation of this PR before. I think we need to find a better alternative to implement the current PR. Specifically, the current way uses a file for exchanging information between two threads, which is neither safe(file system consistency issues) or efficient(have to do address translation each time we access one element in the memory, and will slowdown the simulation greatly). Given that TSIM executes both the device simulation and the host on the same process(different thread). There is no need to exchange the information via a file. Instead, we should just get these information in memory. There are a few alternative approaches. For example, we can grab a shared_ptr to the current virtual memory table(which is global) in the TSIM API by having the DPI module depends on the TSIM. Also please note again that the logics in https://github.com/dmlc/tvm/blob/master/vta/src/sim/sim_driver.cc#L124 can possibly be re-used(it is a virtual memory table). So likely we could isolate that implementation into a header file and reuse it in a few places. |
Good observations Tianqi, let's not merge this just yet until the synchronization issue is resolved. Indeed basing on the |
Hi @tqchen @tmoreau89 , thanks for the suggestions. I admit current implement is neither safe or efficient, and I agree it's important to reuse the code. However, I think there are might be some misunderstanding regarding to this specific case for
First, the reason I use a file for exchanging information is that Second, as I have shown in #3713 , the vta runtime doesn't actually use a thread for executing the instructions. Therefore, I think it's not 'two threads', and currently it's safe to remove
IMHO, the
OK, I came to aware of this, let's reuse the code when possible. |
@tmoreau89 I have performed a refactoring to the code, and reused the DRAM class in sim_driver for the implement of VirtualMemoryManager, therefore, sim_driver and tsim_driver are now using same virtual memory implementation. Please take another review and feel free to leave any comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the refactor @liangfu, the changes look good!
@tqchen - please review the latest changes! |
@liangfu please rebase and resolve Makefile conflict |
@liangfu it looks like the error you are getting in the CI is known, and not related to your code changes. Please re-trigger the CI by rebasing, or making a small edit. |
@tmoreau89 The conflict has been resolved, and the update passed all CI checks. Please take another look. |
Waiting on @tqchen to unblock merging |
vta/src/vmem/virtual_memory.cc
Outdated
} // namespace vta | ||
|
||
|
||
void * vmalloc(uint64_t size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove the vmalloc and other functions, but directly use calls into VirtualMemoryManager::Global() 's memory function to implement these APIs
vta/src/vmem/virtual_memory.h
Outdated
/*! | ||
* \brief virtual memory based memory allocation | ||
*/ | ||
void * vmalloc(uint64_t size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider remove these functions as we can implement them using the members of VirtualMemoryManager
Most changes LGTM, the only improvement we could make here is to remove the indirection of vmalloc APIs, and directly use APIs in VirtualMemoryManager(unless we have the need to cross DLL boundaries as C ABI) |
* initial virtual memory; * initial integration; * include the header file in cmake; * implement allocation with virtual to logical address mapping; * virtual memory for tsim_driver; * implement the missing memory release function; * readability improvement; * readability improvement; * address review comments; * improved robustness in virtual memory allocation; * remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default; * link tvm against vta library; * merge with master * build virtual memory system without linking tvm against vta; * minor change; * reuse VTA_PAGE_BYTES; * using DRAM class from sim_driver as VirtualMemoryManager; * satisfy linter; * add comments in code; * undo changes to Makefile * undo changes to Makefile * retrigger ci; * retrigger ci; * directly call into VirtualMemoryManager::Global()
* initial virtual memory; * initial integration; * include the header file in cmake; * implement allocation with virtual to logical address mapping; * virtual memory for tsim_driver; * implement the missing memory release function; * readability improvement; * readability improvement; * address review comments; * improved robustness in virtual memory allocation; * remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default; * link tvm against vta library; * merge with master * build virtual memory system without linking tvm against vta; * minor change; * reuse VTA_PAGE_BYTES; * using DRAM class from sim_driver as VirtualMemoryManager; * satisfy linter; * add comments in code; * undo changes to Makefile * undo changes to Makefile * retrigger ci; * retrigger ci; * directly call into VirtualMemoryManager::Global()
* initial virtual memory; * initial integration; * include the header file in cmake; * implement allocation with virtual to logical address mapping; * virtual memory for tsim_driver; * implement the missing memory release function; * readability improvement; * readability improvement; * address review comments; * improved robustness in virtual memory allocation; * remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default; * link tvm against vta library; * merge with master * build virtual memory system without linking tvm against vta; * minor change; * reuse VTA_PAGE_BYTES; * using DRAM class from sim_driver as VirtualMemoryManager; * satisfy linter; * add comments in code; * undo changes to Makefile * undo changes to Makefile * retrigger ci; * retrigger ci; * directly call into VirtualMemoryManager::Global()
* initial virtual memory; * initial integration; * include the header file in cmake; * implement allocation with virtual to logical address mapping; * virtual memory for tsim_driver; * implement the missing memory release function; * readability improvement; * readability improvement; * address review comments; * improved robustness in virtual memory allocation; * remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default; * link tvm against vta library; * merge with master * build virtual memory system without linking tvm against vta; * minor change; * reuse VTA_PAGE_BYTES; * using DRAM class from sim_driver as VirtualMemoryManager; * satisfy linter; * add comments in code; * undo changes to Makefile * undo changes to Makefile * retrigger ci; * retrigger ci; * directly call into VirtualMemoryManager::Global()
This PR introduces a virtual memory system for
tsim_driver
, and brings the possibility to perform TSIM test for PYNQ and DE10-Nano, which only have 32-bit address space on the host side.