[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686

liangfu · 2019-08-01T11:38:09Z

This PR introduces a virtual memory system for tsim_driver, and brings the possibility to perform TSIM test for PYNQ and DE10-Nano, which only have 32-bit address space on the host side.

Ravenwater

LGTM

vta/src/tsim/virtual_memory.cc

liangfu · 2019-08-02T01:11:34Z

vta/src/tsim/virtual_memory.h

+  kVirtualMemCopyToHost = 1
+};
+
+#define VTA_VMEM_PAGEFILE "/tmp/vta_tsim_vmem_pagefile.sys"


@tqchen Is there anyway to fetch corresponding file path for Windows?

liangfu · 2019-08-02T06:45:26Z

@vegaluisjose @tmoreau89 I think this enables the missing TestDefaultPynqConfig to be tested on TSIM. Please take a review.

tmoreau89 · 2019-08-02T07:02:18Z

vta/include/vta/dpi/module.h

@@ -26,6 +26,10 @@
 #include <condition_variable>
 #include <string>

+#ifndef VTA_TSIM_USE_VIRTUAL_MEMORY
+#define VTA_TSIM_USE_VIRTUAL_MEMORY (1)


are we going to leave this parameter turned on by default? what are the pros/cons to leaving this macro to be user-defined (with the idea that we're trying to minimize ifdefs in our code)

Advantage

Enable tsim to test VTA subsystems that are configured to use with 32-bit memory address (checkout @vegaluisjose 's illustration at [VTA][Chisel] Add the missing TestDefaultPynqConfig #3380)

Disadvantage

Current virtual memory system uses virtual memory page file system to write and read address mappings, these may not be compatible to Windows.

Otherwise, I think it's safe to enable virtual memory for tsim_driver by default.

Thanks for the clear explanation. I'd opt towards leaving it on by default, at the detriment to Windows users, but others can chime in @tqchen @vegaluisjose

…sim by default;

vegaluisjose · 2019-08-02T16:54:09Z

Hey @liangfu ,

Cool work, do we have to update tsim_driver and runtime?

@tmoreau89 one thing I am thinking is that we should maybe have for now a ShellConfig that maps the runtime/driver and we can call it for now TestConfig. One of the challenges is that different backends have different needs, for example the memory bus:

Pynq data:64-bit and address:32-bit
Ultra96 data:128-bit and address:64-bit
F1 data:512-bit and address:64-bit
TSIM will now with virtual-memory be data:64-bit address:32-bit (then translated into 64-bit)

The challenge is that these addressing affects also the runtime, so I believe we could go back to 32-bit for the moment and later figure out what would be the best way for scaling it.

These are still needed though because the current chisel version takes sizes in bytes directly and not normalized.

I don't know if all this make sense?

tmoreau89 · 2019-08-02T17:52:26Z

@vegaluisjose good points, addressing the multiplicity of data and address widths can be done in a separate PR since it will require runtime changes.

tmoreau89 · 2019-08-02T17:54:53Z

Also I would feel more comfortable if we would run the TSIM tests in the CI to ensure that things remain stable. @liangfu is that something you want to take a stab at? I'd suggest building the TSIM and sim sources together if "tsim" is enabled, and overwriting that in the vta_config.json when running ci tests. We'll need to change the Dockerfiles too to install sbt etc.

If you'd rather have us do it, we can set up that infra over the next few days; I'm thinking Scala lint test would also be good as contributions to the Chisel infra increase.

tmoreau89 · 2019-08-02T21:57:53Z

One more question with virtual memory support, does this mean that now when invoking VTADeviceRun in the runtime, we wouldn't need to pass the physical addresses to the input, weight, acc, out and uop buffers since we can embed the addresses in the instruction stream? or is that orthogonal.

liangfu · 2019-08-05T03:00:58Z

Hi @vegaluisjose Regarding to your comments, I would agree with @tmoreau89 on addressing the multiplicity of data and address widths in a separate PR.

do we have to update tsim_driver and runtime?

There is no need to update the runtime for switch to virtual memory for now. I just replaced malloc, free and memcpy with their virtual memory related counterparts, and it worked for PYNQ backend.

@tmoreau89 Regarding to your comments,

I'd suggest building the TSIM and sim sources together if "tsim" is enabled, and overwriting that in the vta_config.json when running ci tests. We'll need to change the Dockerfiles too to install sbt etc.

I'm thinking Scala lint test would also be good as contributions to the Chisel infra increase.

I think these would be excellent changes to make VTA continuously grow.

If you'd rather have us do it, we can set up that infra over the next few days;

I think I can handle this in a separate PR.

we wouldn't need to pass the physical addresses to the input, weight, acc, out and uop buffers since we can embed the addresses in the instruction stream?

This is the idea I didn't think of earlier. I think this is possible at the cost of translating 32-bit virtual address back to physical address. But I don't think it worth the cost for the FPGA devices to use the virtual address system.

vegaluisjose · 2019-08-05T03:17:34Z

Hey @liangfu and @tmoreau89,

Sure, I was not expecting to solve all of that here, it was more about bringing it up so you guys are aware of these details.

Keep the good work!

tmoreau89 · 2019-08-05T04:49:10Z

I think I can handle this in a separate PR.

Thanks Liangfu, no need to do this in a separate PR as it is currently being worked on in #3704 and a follow-up PR!

tmoreau89 · 2019-08-05T05:40:00Z

@tqchen let us know if there are more changes to be made to this PR

tqchen · 2019-08-05T17:20:02Z

Sorry i did not take a close look at the implementation of this PR before. I think we need to find a better alternative to implement the current PR.

Specifically, the current way uses a file for exchanging information between two threads, which is neither safe(file system consistency issues) or efficient(have to do address translation each time we access one element in the memory, and will slowdown the simulation greatly).

Given that TSIM executes both the device simulation and the host on the same process(different thread). There is no need to exchange the information via a file. Instead, we should just get these information in memory.

There are a few alternative approaches. For example, we can grab a shared_ptr to the current virtual memory table(which is global) in the TSIM API by having the DPI module depends on the TSIM.

Also please note again that the logics in https://github.com/dmlc/tvm/blob/master/vta/src/sim/sim_driver.cc#L124 can possibly be re-used(it is a virtual memory table). So likely we could isolate that implementation into a header file and reuse it in a few places.

tmoreau89 · 2019-08-05T17:27:53Z

Good observations Tianqi, let's not merge this just yet until the synchronization issue is resolved. Indeed basing on the sim_driver.cc design would be best. I suggest implementing a virtual memory library so both drivers can use the same code @liangfu

liangfu · 2019-08-06T02:38:26Z

Hi @tqchen @tmoreau89 , thanks for the suggestions. I admit current implement is neither safe or efficient, and I agree it's important to reuse the code. However, I think there are might be some misunderstanding regarding to this specific case for tsim_driver. To address your comments,
(I might be wrong, and please feel free to point it out.)

Specifically, the current way uses a file for exchanging information between two threads

First, the reason I use a file for exchanging information is that DPIModule, tsim_driver.cc and tsim_device.cc are isolated in different shared libraries, they are built inside libtvm.so, libvta.so and libvta_hw.so respectively. I have no idea how to use shared_ptr in exchanging the address mapping between isolated shared libraries.

Second, as I have shown in #3713 , the vta runtime doesn't actually use a thread for executing the instructions. Therefore, I think it's not 'two threads', and currently it's safe to remove lock_guard in a virtual memory implementation.

have to do address translation each time we access one element in the memory, and will slowdown the simulation greatly

IMHO, the DRAM class inside sim_driver cannot avoid address translation when trying to evaluate its value in the virtual address space. I mean we still need to call the DRAM::GetPhyAddr function (, or I might be wrong here). @tqchen Would you please elaborate a little bit more?

Also please note again that the logics in sim_driver can possibly be re-used

OK, I came to aware of this, let's reuse the code when possible.

liangfu · 2019-08-14T12:22:09Z

@tmoreau89 I have performed a refactoring to the code, and reused the DRAM class in sim_driver for the implement of VirtualMemoryManager, therefore, sim_driver and tsim_driver are now using same virtual memory implementation. Please take another review and feel free to leave any comments.

tmoreau89

Thank you for the refactor @liangfu, the changes look good!

tmoreau89 · 2019-08-19T01:01:30Z

@tqchen - please review the latest changes!

tmoreau89 · 2019-08-19T01:02:39Z

@liangfu please rebase and resolve Makefile conflict

tmoreau89 · 2019-08-19T05:03:08Z

@liangfu it looks like the error you are getting in the CI is known, and not related to your code changes. Please re-trigger the CI by rebasing, or making a small edit.

…memory

liangfu · 2019-08-23T08:31:57Z

@tmoreau89 The conflict has been resolved, and the update passed all CI checks. Please take another look.

tmoreau89 · 2019-08-23T16:31:09Z

Waiting on @tqchen to unblock merging

tqchen · 2019-08-26T00:58:37Z

vta/src/vmem/virtual_memory.cc

+}  // namespace vta
+
+
+void * vmalloc(uint64_t size) {


I think we could remove the vmalloc and other functions, but directly use calls into VirtualMemoryManager::Global() 's memory function to implement these APIs

tqchen · 2019-08-26T00:59:19Z

vta/src/vmem/virtual_memory.h

+/*!
+ * \brief virtual memory based memory allocation
+ */
+void * vmalloc(uint64_t size);


Consider remove these functions as we can implement them using the members of VirtualMemoryManager

tqchen · 2019-08-26T01:01:14Z

Most changes LGTM, the only improvement we could make here is to remove the indirection of vmalloc APIs, and directly use APIs in VirtualMemoryManager(unless we have the need to cross DLL boundaries as C ABI)

* initial virtual memory; * initial integration; * include the header file in cmake; * implement allocation with virtual to logical address mapping; * virtual memory for tsim_driver; * implement the missing memory release function; * readability improvement; * readability improvement; * address review comments; * improved robustness in virtual memory allocation; * remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for tsim by default; * link tvm against vta library; * merge with master * build virtual memory system without linking tvm against vta; * minor change; * reuse VTA_PAGE_BYTES; * using DRAM class from sim_driver as VirtualMemoryManager; * satisfy linter; * add comments in code; * undo changes to Makefile * undo changes to Makefile * retrigger ci; * retrigger ci; * directly call into VirtualMemoryManager::Global()

liangfu and others added 6 commits July 30, 2019 22:57

initial virtual memory;

f4fa007

initial integration;

74a5426

include the header file in cmake;

2efdc9f

implement allocation with virtual to logical address mapping;

bdc9814

Merge branch 'master' into virtual_memory

78e08db

virtual memory for tsim_driver;

296abbb

Ravenwater reviewed Aug 1, 2019

View reviewed changes

liangfu added 3 commits August 1, 2019 22:42

implement the missing memory release function;

4a7674b

readability improvement;

91d93c9

readability improvement;

477633c

tqchen requested changes Aug 1, 2019

View reviewed changes

vta/src/tsim/virtual_memory.cc Outdated Show resolved Hide resolved

vta/src/tsim/virtual_memory.cc Outdated Show resolved Hide resolved

vta/src/tsim/virtual_memory.cc Outdated Show resolved Hide resolved

liangfu commented Aug 2, 2019

View reviewed changes

liangfu added 2 commits August 2, 2019 09:52

address review comments;

ed97480

improved robustness in virtual memory allocation;

832a598

tmoreau89 reviewed Aug 2, 2019

View reviewed changes

remove VTA_TSIM_USE_VIRTUAL_MEMORY macro and use virtual memory for t…

b0a185a

…sim by default;

tmoreau89 approved these changes Aug 5, 2019

View reviewed changes

tqchen added the status: need update need update based on feedbacks label Aug 5, 2019

liangfu added 6 commits August 14, 2019 13:54

build virtual memory system without linking tvm against vta;

d8ce8b6

minor change;

0cdaf42

reuse VTA_PAGE_BYTES;

586fd89

using DRAM class from sim_driver as VirtualMemoryManager;

9064fb9

satisfy linter;

9dab853

add comments in code;

3ea1354

tmoreau89 reviewed Aug 19, 2019

View reviewed changes

liangfu and others added 3 commits August 19, 2019 09:55

Merge branch 'master' into virtual_memory

7f6fd85

undo changes to Makefile

7a1774a

undo changes to Makefile

676a998

liangfu added 3 commits August 19, 2019 13:23

retrigger ci;

17814ce

Merge branch 'virtual_memory' of github.com:liangfu/tvm into virtual_…

4c0f0f9

…memory

retrigger ci;

c223923

tqchen requested changes Aug 26, 2019

View reviewed changes

directly call into VirtualMemoryManager::Global()

1617787

tqchen approved these changes Aug 26, 2019

View reviewed changes

tmoreau89 approved these changes Aug 26, 2019

View reviewed changes

tmoreau89 merged commit 92b6ca7 into apache:master Aug 26, 2019

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686

[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686

liangfu commented Aug 1, 2019 •

edited

Loading

Ravenwater left a comment

liangfu Aug 2, 2019

liangfu commented Aug 2, 2019

tmoreau89 Aug 2, 2019

liangfu Aug 2, 2019

tmoreau89 Aug 2, 2019

vegaluisjose commented Aug 2, 2019

tmoreau89 commented Aug 2, 2019

tmoreau89 commented Aug 2, 2019

tmoreau89 commented Aug 2, 2019

liangfu commented Aug 5, 2019

vegaluisjose commented Aug 5, 2019

tmoreau89 commented Aug 5, 2019

tmoreau89 commented Aug 5, 2019

tqchen commented Aug 5, 2019 •

edited

Loading

tmoreau89 commented Aug 5, 2019

liangfu commented Aug 6, 2019 •

edited

Loading

liangfu commented Aug 14, 2019

tmoreau89 left a comment

tmoreau89 commented Aug 19, 2019

tmoreau89 commented Aug 19, 2019

tmoreau89 commented Aug 19, 2019

liangfu commented Aug 23, 2019

tmoreau89 commented Aug 23, 2019

tqchen Aug 26, 2019

tqchen Aug 26, 2019

tqchen commented Aug 26, 2019

[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686

[VTA][TSIM] Introduce Virtual Memory for TSIM Driver #3686

Conversation

liangfu commented Aug 1, 2019 • edited Loading

Ravenwater left a comment

Choose a reason for hiding this comment

liangfu Aug 2, 2019

Choose a reason for hiding this comment

liangfu commented Aug 2, 2019

tmoreau89 Aug 2, 2019

Choose a reason for hiding this comment

liangfu Aug 2, 2019

Choose a reason for hiding this comment

Advantage

Disadvantage

tmoreau89 Aug 2, 2019

Choose a reason for hiding this comment

vegaluisjose commented Aug 2, 2019

tmoreau89 commented Aug 2, 2019

tmoreau89 commented Aug 2, 2019

tmoreau89 commented Aug 2, 2019

liangfu commented Aug 5, 2019

vegaluisjose commented Aug 5, 2019

tmoreau89 commented Aug 5, 2019

tmoreau89 commented Aug 5, 2019

tqchen commented Aug 5, 2019 • edited Loading

tmoreau89 commented Aug 5, 2019

liangfu commented Aug 6, 2019 • edited Loading

liangfu commented Aug 14, 2019

tmoreau89 left a comment

Choose a reason for hiding this comment

tmoreau89 commented Aug 19, 2019

tmoreau89 commented Aug 19, 2019

tmoreau89 commented Aug 19, 2019

liangfu commented Aug 23, 2019

tmoreau89 commented Aug 23, 2019

tqchen Aug 26, 2019

Choose a reason for hiding this comment

tqchen Aug 26, 2019

Choose a reason for hiding this comment

tqchen commented Aug 26, 2019

liangfu commented Aug 1, 2019 •

edited

Loading

tqchen commented Aug 5, 2019 •

edited

Loading

liangfu commented Aug 6, 2019 •

edited

Loading