Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686

Merged
merged 29 commits into from
Aug 26, 2019

Conversation

liangfu
Copy link
Member

@liangfu liangfu commented Aug 1, 2019

This PR introduces a virtual memory system for tsim_driver, and brings the possibility to perform TSIM test for PYNQ and DE10-Nano, which only have 32-bit address space on the host side.

Copy link

@Ravenwater Ravenwater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

vta/src/tsim/virtual_memory.cc Outdated Show resolved Hide resolved
vta/src/tsim/virtual_memory.cc Outdated Show resolved Hide resolved
vta/src/tsim/virtual_memory.cc Outdated Show resolved Hide resolved
kVirtualMemCopyToHost = 1
};

#define VTA_VMEM_PAGEFILE "/tmp/vta_tsim_vmem_pagefile.sys"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen Is there anyway to fetch corresponding file path for Windows?

@liangfu
Copy link
Member Author

liangfu commented Aug 2, 2019

@vegaluisjose @tmoreau89 I think this enables the missing TestDefaultPynqConfig to be tested on TSIM. Please take a review.

@@ -26,6 +26,10 @@
#include <condition_variable>
#include <string>

#ifndef VTA_TSIM_USE_VIRTUAL_MEMORY
#define VTA_TSIM_USE_VIRTUAL_MEMORY (1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to leave this parameter turned on by default? what are the pros/cons to leaving this macro to be user-defined (with the idea that we're trying to minimize ifdefs in our code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advantage

Disadvantage

  • Current virtual memory system uses virtual memory page file system to write and read address mappings, these may not be compatible to Windows.

Otherwise, I think it's safe to enable virtual memory for tsim_driver by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear explanation. I'd opt towards leaving it on by default, at the detriment to Windows users, but others can chime in @tqchen @vegaluisjose

@vegaluisjose
Copy link
Member

Hey @liangfu ,

Cool work, do we have to update tsim_driver and runtime?

@tmoreau89 one thing I am thinking is that we should maybe have for now a ShellConfig that maps the runtime/driver and we can call it for now TestConfig. One of the challenges is that different backends have different needs, for example the memory bus:

Pynq data:64-bit and address:32-bit
Ultra96 data:128-bit and address:64-bit
F1 data:512-bit and address:64-bit
TSIM will now with virtual-memory be data:64-bit address:32-bit (then translated into 64-bit)

The challenge is that these addressing affects also the runtime, so I believe we could go back to 32-bit for the moment and later figure out what would be the best way for scaling it.

These are still needed though because the current chisel version takes sizes in bytes directly and not normalized.

I don't know if all this make sense?

@tmoreau89
Copy link
Contributor

@vegaluisjose good points, addressing the multiplicity of data and address widths can be done in a separate PR since it will require runtime changes.

@tmoreau89
Copy link
Contributor

Also I would feel more comfortable if we would run the TSIM tests in the CI to ensure that things remain stable. @liangfu is that something you want to take a stab at? I'd suggest building the TSIM and sim sources together if "tsim" is enabled, and overwriting that in the vta_config.json when running ci tests. We'll need to change the Dockerfiles too to install sbt etc.

If you'd rather have us do it, we can set up that infra over the next few days; I'm thinking Scala lint test would also be good as contributions to the Chisel infra increase.

@tmoreau89
Copy link
Contributor

One more question with virtual memory support, does this mean that now when invoking VTADeviceRun in the runtime, we wouldn't need to pass the physical addresses to the input, weight, acc, out and uop buffers since we can embed the addresses in the instruction stream? or is that orthogonal.

@liangfu
Copy link
Member Author

liangfu commented Aug 5, 2019

Hi @vegaluisjose Regarding to your comments, I would agree with @tmoreau89 on addressing the multiplicity of data and address widths in a separate PR.

do we have to update tsim_driver and runtime?

There is no need to update the runtime for switch to virtual memory for now. I just replaced malloc, free and memcpy with their virtual memory related counterparts, and it worked for PYNQ backend.

@tmoreau89 Regarding to your comments,

I'd suggest building the TSIM and sim sources together if "tsim" is enabled, and overwriting that in the vta_config.json when running ci tests. We'll need to change the Dockerfiles too to install sbt etc.

I'm thinking Scala lint test would also be good as contributions to the Chisel infra increase.

I think these would be excellent changes to make VTA continuously grow.

If you'd rather have us do it, we can set up that infra over the next few days;

I think I can handle this in a separate PR.

we wouldn't need to pass the physical addresses to the input, weight, acc, out and uop buffers since we can embed the addresses in the instruction stream?

This is the idea I didn't think of earlier. I think this is possible at the cost of translating 32-bit virtual address back to physical address. But I don't think it worth the cost for the FPGA devices to use the virtual address system.

@vegaluisjose
Copy link
Member

Hey @liangfu and @tmoreau89,

Sure, I was not expecting to solve all of that here, it was more about bringing it up so you guys are aware of these details.

Keep the good work!

@tmoreau89
Copy link
Contributor

I think I can handle this in a separate PR.

Thanks Liangfu, no need to do this in a separate PR as it is currently being worked on in #3704 and a follow-up PR!

@tmoreau89
Copy link
Contributor

@tqchen let us know if there are more changes to be made to this PR

@tqchen
Copy link
Member

tqchen commented Aug 5, 2019

Sorry i did not take a close look at the implementation of this PR before. I think we need to find a better alternative to implement the current PR.

Specifically, the current way uses a file for exchanging information between two threads, which is neither safe(file system consistency issues) or efficient(have to do address translation each time we access one element in the memory, and will slowdown the simulation greatly).

Given that TSIM executes both the device simulation and the host on the same process(different thread). There is no need to exchange the information via a file. Instead, we should just get these information in memory.

There are a few alternative approaches. For example, we can grab a shared_ptr to the current virtual memory table(which is global) in the TSIM API by having the DPI module depends on the TSIM.

Also please note again that the logics in https://github.com/dmlc/tvm/blob/master/vta/src/sim/sim_driver.cc#L124 can possibly be re-used(it is a virtual memory table). So likely we could isolate that implementation into a header file and reuse it in a few places.

@tqchen tqchen added the status: need update need update based on feedbacks label Aug 5, 2019
@tmoreau89
Copy link
Contributor

Good observations Tianqi, let's not merge this just yet until the synchronization issue is resolved. Indeed basing on the sim_driver.cc design would be best. I suggest implementing a virtual memory library so both drivers can use the same code @liangfu

@liangfu
Copy link
Member Author

liangfu commented Aug 6, 2019

Hi @tqchen @tmoreau89 , thanks for the suggestions. I admit current implement is neither safe or efficient, and I agree it's important to reuse the code. However, I think there are might be some misunderstanding regarding to this specific case for tsim_driver. To address your comments,
(I might be wrong, and please feel free to point it out.)

Specifically, the current way uses a file for exchanging information between two threads

First, the reason I use a file for exchanging information is that DPIModule, tsim_driver.cc and tsim_device.cc are isolated in different shared libraries, they are built inside libtvm.so, libvta.so and libvta_hw.so respectively. I have no idea how to use shared_ptr in exchanging the address mapping between isolated shared libraries.

Second, as I have shown in #3713 , the vta runtime doesn't actually use a thread for executing the instructions. Therefore, I think it's not 'two threads', and currently it's safe to remove lock_guard in a virtual memory implementation.

have to do address translation each time we access one element in the memory, and will slowdown the simulation greatly

IMHO, the DRAM class inside sim_driver cannot avoid address translation when trying to evaluate its value in the virtual address space. I mean we still need to call the DRAM::GetPhyAddr function (, or I might be wrong here). @tqchen Would you please elaborate a little bit more?

Also please note again that the logics in sim_driver can possibly be re-used

OK, I came to aware of this, let's reuse the code when possible.

@liangfu
Copy link
Member Author

liangfu commented Aug 14, 2019

@tmoreau89 I have performed a refactoring to the code, and reused the DRAM class in sim_driver for the implement of VirtualMemoryManager, therefore, sim_driver and tsim_driver are now using same virtual memory implementation. Please take another review and feel free to leave any comments.

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the refactor @liangfu, the changes look good!

@tmoreau89
Copy link
Contributor

@tqchen - please review the latest changes!

@tmoreau89
Copy link
Contributor

@liangfu please rebase and resolve Makefile conflict

@tmoreau89
Copy link
Contributor

@liangfu it looks like the error you are getting in the CI is known, and not related to your code changes. Please re-trigger the CI by rebasing, or making a small edit.

@liangfu
Copy link
Member Author

liangfu commented Aug 23, 2019

@tmoreau89 The conflict has been resolved, and the update passed all CI checks. Please take another look.

@tmoreau89
Copy link
Contributor

Waiting on @tqchen to unblock merging

} // namespace vta


void * vmalloc(uint64_t size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could remove the vmalloc and other functions, but directly use calls into VirtualMemoryManager::Global() 's memory function to implement these APIs

/*!
* \brief virtual memory based memory allocation
*/
void * vmalloc(uint64_t size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider remove these functions as we can implement them using the members of VirtualMemoryManager

@tqchen
Copy link
Member

tqchen commented Aug 26, 2019

Most changes LGTM, the only improvement we could make here is to remove the indirection of vmalloc APIs, and directly use APIs in VirtualMemoryManager(unless we have the need to cross DLL boundaries as C ABI)

@tmoreau89 tmoreau89 merged commit 92b6ca7 into apache:master Aug 26, 2019
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019
* initial virtual memory;

* initial integration;

* include the header file in cmake;

* implement allocation with virtual to logical address mapping;

* virtual memory for tsim_driver;

* implement the missing memory release function;

* readability improvement;

* readability improvement;

* address review comments;

* improved robustness in virtual memory allocation;

* remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default;

* link tvm against vta library;

* merge with master

* build virtual memory system without linking tvm against vta;

* minor change;

* reuse VTA_PAGE_BYTES;

* using DRAM class from sim_driver as VirtualMemoryManager;

* satisfy linter;

* add comments in code;

* undo changes to Makefile

* undo changes to Makefile

* retrigger ci;

* retrigger ci;

* directly call into VirtualMemoryManager::Global()
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019
* initial virtual memory;

* initial integration;

* include the header file in cmake;

* implement allocation with virtual to logical address mapping;

* virtual memory for tsim_driver;

* implement the missing memory release function;

* readability improvement;

* readability improvement;

* address review comments;

* improved robustness in virtual memory allocation;

* remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default;

* link tvm against vta library;

* merge with master

* build virtual memory system without linking tvm against vta;

* minor change;

* reuse VTA_PAGE_BYTES;

* using DRAM class from sim_driver as VirtualMemoryManager;

* satisfy linter;

* add comments in code;

* undo changes to Makefile

* undo changes to Makefile

* retrigger ci;

* retrigger ci;

* directly call into VirtualMemoryManager::Global()
wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 16, 2019
* initial virtual memory;

* initial integration;

* include the header file in cmake;

* implement allocation with virtual to logical address mapping;

* virtual memory for tsim_driver;

* implement the missing memory release function;

* readability improvement;

* readability improvement;

* address review comments;

* improved robustness in virtual memory allocation;

* remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default;

* link tvm against vta library;

* merge with master

* build virtual memory system without linking tvm against vta;

* minor change;

* reuse VTA_PAGE_BYTES;

* using DRAM class from sim_driver as VirtualMemoryManager;

* satisfy linter;

* add comments in code;

* undo changes to Makefile

* undo changes to Makefile

* retrigger ci;

* retrigger ci;

* directly call into VirtualMemoryManager::Global()
tqchen pushed a commit to tqchen/tvm that referenced this pull request Mar 29, 2020
* initial virtual memory;

* initial integration;

* include the header file in cmake;

* implement allocation with virtual to logical address mapping;

* virtual memory for tsim_driver;

* implement the missing memory release function;

* readability improvement;

* readability improvement;

* address review comments;

* improved robustness in virtual memory allocation;

* remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default;

* link tvm against vta library;

* merge with master

* build virtual memory system without linking tvm against vta;

* minor change;

* reuse VTA_PAGE_BYTES;

* using DRAM class from sim_driver as VirtualMemoryManager;

* satisfy linter;

* add comments in code;

* undo changes to Makefile

* undo changes to Makefile

* retrigger ci;

* retrigger ci;

* directly call into VirtualMemoryManager::Global()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need update need update based on feedbacks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants