Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of uTVM #3227

Merged
merged 109 commits into from
Jul 25, 2019
Merged

Implementation of uTVM #3227

merged 109 commits into from
Jul 25, 2019

Conversation

weberlo
Copy link
Contributor

@weberlo weberlo commented May 22, 2019

This PR provides an initial implementation of the ideas discussed in #2563. It has been a joint effort between @Mutinifni, @tqchen, and me to design and implement this system.

We only include an implementation of the emulated host low-level device interface in this PR. The OpenOCD interface will be included in a future PR.

Currently, the emulated host device can run ResNet18. As the implementation matures, we will try more complex models.

CC @huajsj @yongwww @jroesch @Ravenwater @junrushao1994 @joshpoll

src/api/api_pass.cc Outdated Show resolved Hide resolved
Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick comments first, please check if you need to update the submodules or can you use the same as upstream

tests/python/unittest/test_runtime_micro.py Outdated Show resolved Hide resolved
topi/python/topi/generic/nn.py Show resolved Hide resolved
python/tvm/contrib/binutil.py Outdated Show resolved Hide resolved
python/tvm/micro/base.py Outdated Show resolved Hide resolved
python/tvm/micro/cross_compile.py Outdated Show resolved Hide resolved
src/runtime/micro/device/utvm_runtime.h Outdated Show resolved Hide resolved
src/runtime/micro/low_level_device.h Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Outdated Show resolved Hide resolved
src/runtime/micro/target_data_layout_encoder.h Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented May 22, 2019

cc @mjs-arm @kparzysz

src/runtime/micro/device/utvm_runtime.c Outdated Show resolved Hide resolved
src/runtime/micro/device/utvm_runtime.h Outdated Show resolved Hide resolved
src/runtime/micro/device/utvm_runtime.h Outdated Show resolved Hide resolved
tvm_vals_slot.Write(&val_addr);
break;
}
// TODO(mutinifni): implement other cases if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add implementation that passes in double, and int64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add a test for this. Is there an example program that generates scalar values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^-- this question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
@weberlo
Copy link
Contributor Author

weberlo commented May 23, 2019

I haven't addressed all of the feedback yet. @tqchen suggested I replace the DevAddr, DevBaseAddr, and DevBaseOffset classes with a single DevicePtr class for addresses, and then we would use size_t for offsets. I still need to think some more on whether this is the way to go.

src/runtime/micro/device/utvm_runtime.c Outdated Show resolved Hide resolved
include/tvm/runtime/micro/utvm_device_lib.h Outdated Show resolved Hide resolved
src/runtime/micro/micro_common.h Show resolved Hide resolved
python/tvm/micro/base.py Outdated Show resolved Hide resolved
src/runtime/micro/device/utvm_runtime.c Outdated Show resolved Hide resolved
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
@tqchen tqchen changed the title Implementation of MicroTVM Implementation of uTVM May 24, 2019
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
src/runtime/micro/micro_module.cc Outdated Show resolved Hide resolved
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented May 30, 2019

@weberlo please fix the CI error and make sure CI is green. @liangfu please help to do another round of review. also cc @grwlf @Mutinifni @MarisaKirisame @kazum @vinx13. Please help review if you can.

@tqchen tqchen added the status: need update need update based on feedbacks label May 30, 2019
python/tvm/contrib/binutil.py Outdated Show resolved Hide resolved
python/tvm/contrib/binutil.py Outdated Show resolved Hide resolved
python/tvm/micro/base.py Outdated Show resolved Hide resolved
src/codegen/codegen_c_host.cc Outdated Show resolved Hide resolved
src/runtime/micro/device/utvm_runtime.c Show resolved Hide resolved
src/runtime/micro/host_low_level_device.cc Show resolved Hide resolved
src/runtime/micro/micro_session.cc Outdated Show resolved Hide resolved
@u99127
Copy link
Contributor

u99127 commented Jun 1, 2019

I've been trying to play a bit with this , after doing a build with the branch I tried running

$> python test_runtime_micro.py gives me a segfault with the following backtrace.

Inspecting this in gdb after doing a build which has -g added to the flags,
tvm::runtime::MicroSectionAllocator::Free (this=0x0, offs=...) at /home/ramana/tvm-test/tvm/src/runtime/micro/micro_session.h:85
85 CHECK(alloc_map_.find(ptr) != alloc_map_.end()) << "freed pointer was never allocated";

I see the following backtrace.

#0 tvm::runtime::MicroSectionAllocator::Free (this=0x0, offs=...) at /home/ramana/tvm-test/tvm/src/runtime/micro/micro_session.h:85
#1 0x00007fffacfe96e4 in tvm::runtime::MicroSession::FreeInSection (this=, type=type@entry=tvm::runtime::SectionKind::kHeap, ptr=...)
at /home/ramana/tvm-test/tvm/src/runtime/micro/micro_session.cc:144
#2 0x00007fffacfde3d5 in tvm::runtime::MicroDeviceAPI::FreeDataSpace (this=, ctx=..., ptr=) at /home/ramana/tvm-test/tvm/src/runtime/micro/micro_device_api.cc:59
#3 0x00007fffacf88ec1 in tvm::runtime::NDArray::Internal::DefaultDeleter (ptr=0x124d3e0) at /home/ramana/tvm-test/tvm/src/runtime/ndarray.cc:62
#4 0x00007fffacf82060 in tvm::runtime::NDArray::Container::DecRef (this=) at /home/ramana/tvm-test/tvm/include/tvm/runtime/ndarray.h:326
#5 TVMArrayFree (handle=) at /home/ramana/tvm-test/tvm/src/runtime/ndarray.cc:220
#6 0x00007fffae64bdae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#7 0x00007fffae64b71f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#8 0x00007fffae85f524 in _ctypes_callproc () from /home/ramana/tensorflow/v/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#9 0x00007fffae85fb93 in ?? () from /home/ramana/tensorflow/v/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#10 0x00000000005a730c in _PyObject_FastCallKeywords ()
#11 0x0000000000503073 in ?? ()
#12 0x0000000000506859 in _PyEval_EvalFrameDefault ()
#13 0x0000000000501945 in _PyFunction_FastCallDict ()
#14 0x0000000000591461 in ?? ()
#15 0x00000000005a337c in _PyObject_FastCallDict ()
#16 0x000000000061a2bd in ?? ()
#17 0x000000000055e895 in PyObject_CallFinalizerFromDealloc ()
#18 0x0000000000554567 in ?? ()
#19 0x0000000000504f98 in ?? ()
#20 0x0000000000502540 in ?? ()
#21 0x0000000000502f3d in ?? ()
#22 0x0000000000506859 in _PyEval_EvalFrameDefault ()
#23 0x0000000000504c28 in ?? ()
#24 0x0000000000506393 in PyEval_EvalCode ()
#25 0x0000000000634d52 in ?? ()
#26 0x0000000000634e0a in PyRun_FileExFlags ()
#27 0x00000000006385c8 in PyRun_SimpleFileExFlags ()
#28 0x000000000063915a in Py_Main ()
#29 0x00000000004a6f10 in main ()

This is on Ubuntu 18.04 . I'm happy to give any more details that you need.

@weberlo
Copy link
Contributor Author

weberlo commented Jun 2, 2019

@u99127 Oh. That's embarassing. I think I made a change and forgot to rebuild the C++ before I ran the Python test. I've been looking into this, and it's a strange problem. It has to do with these lines that I added to destruct the allocators.

Take this code snippet, for example:

with HOST_SESSION as sess:
    ctx = tvm.micro_dev(0)
    a = tvm.nd.array(..., ctx)

# Garbage collection of `a` happens here.

EndSession gets called when the with block ends, but the tensor a doesn't get freed until Python garbage collects them. At that point, the allocators don't exist and can't be used to deallocate the tensor.

@tqchen What can we do to deal with this problem? One possible fix is to change FreeInSection to return immediately if the session isn't active, but this could still cause problems if we have two consecutive sessions:

with HOST_SESSION as sess:
    ctx = tvm.micro_dev(0)
    a = tvm.nd.array(..., ctx)

with HOST_SESSION as sess:
    ctx = tvm.micro_dev(0)
    a = tvm.nd.array(..., ctx)
    # Garbage collection of the first `a` happens here.

tests/python/unittest/test_runtime_micro.py Outdated Show resolved Hide resolved
python/tvm/micro/base.py Outdated Show resolved Hide resolved
python/tvm/micro/base.py Outdated Show resolved Hide resolved
mod_src = c_mod.get_source()
with open(lib_src_path, "w") as f:
f.write(mod_src)
# Compile to object file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this function, instead, directly support tvm.micro.create_lib(c_mod, "mylib.obj")

@tqchen
Copy link
Member

tqchen commented Jun 2, 2019

@weberlo Regarding the lifetime issue of Session. I was expecting this, but did not expect it to come so soon :) Here is a solution: keep a reference to shared_ptr in both allocated objects, Module, and PackedFunc.

More specifically:

  • EnterSession will reset the Session::Global to a new Session
  • The data, function, and module keep a shared_ptr to the session and use the session through the shared_ptr. Note that during the creation of a Module, the shared_ptr is obtained by Session::Global()
  • ExitSession is called, Session::Global shared_ptr is reset to nullptr.
  • At this moment, the old data, function, and module still keep the reference to the old session shared_ptr, the session is not really "closed" until all these resources went out of the scope.
  • We need to make sure to compare session ptr when during DeviceAPI::Copy, since you cannot copy a data from a stale session to a new one.

You can find these steps shows one design principle: if you rely on global singleton then there will be a destruction order problem(what if your dependent singleton object get destructed before you do). One way to resolve that is to keep a shared_ptr ref to the object so that it won't get destructed until every consumer goes out of scope.

@weberlo
Copy link
Contributor Author

weberlo commented Jun 3, 2019

@tqchen This seems like a good solution. One thing that still worries me is the situation where there are two concurrent sessions that both use the same underlying physical device.

This isn't a problem with the HostLowLevelDevice, because each instance of the class corresponds to a separate "physical device". But this becomes a problem with the OpenOCDLowLevelDevice, because each instance of that class is connecting to a single physical device. In that case, when there are still pointers to the first session, a second OpenOCD session cannot be created, because the socket address will still be in use by the first session.

@tqchen
Copy link
Member

tqchen commented Jun 3, 2019

In that case, we will need to make sure all the resources are out of scope, usually wrapping everything related in a function will help to resolve the issue.

@u99127
Copy link
Contributor

u99127 commented Jun 3, 2019

It looks to me like $TVMHOME/python/setup.py is missing the 'nose' package as a dependency . Found by trying to execute test_runtime_micro.py in a fresh environment.

weberlo added 11 commits July 24, 2019 18:51
We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.
Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.
@tqchen tqchen merged commit ef909df into apache:master Jul 25, 2019
@tqchen
Copy link
Member

tqchen commented Jul 25, 2019

Thanks @weberlo @u99127 @MarisaKirisame @liangfu @Mutinifni ! This PR is now merged

@tqchen tqchen added status: accepted and removed status: need review status: need update need update based on feedbacks labels Jul 25, 2019
anijain2305 pushed a commit to anijain2305/tvm that referenced this pull request Aug 2, 2019
* uTVM interfaces (apache#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (apache#18)

* added micro_common implementation and python interfaces (apache#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint
@apivovarov
Copy link
Contributor

@weberlo, @tqchen What you think if we rename compile_cmd param back to cc in lib.export()?

@weberlo
Copy link
Contributor Author

weberlo commented Aug 7, 2019

@apivovarov Which part of the PR are you referring to?

wweic pushed a commit to wweic/tvm that referenced this pull request Aug 9, 2019
* uTVM interfaces (apache#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (apache#18)

* added micro_common implementation and python interfaces (apache#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint
@weberlo
Copy link
Contributor Author

weberlo commented Aug 10, 2019

@apivovarov I just made a PR to remedy this (#3746). Sorry for the trouble.

@apivovarov
Copy link
Contributor

Thank you!

wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019
* uTVM interfaces (#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (#18)

* added micro_common implementation and python interfaces (#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants