Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[µTVM] Enable AutoTVM for ARM STM32F746XX Boards #4274

Merged
merged 70 commits into from
Dec 2, 2019

Conversation

weberlo
Copy link
Contributor

@weberlo weberlo commented Nov 7, 2019

This PR adds support for autotuning via MicroTVM. To test this infrastructure on a physical board, I have added support for ARM STM32F746XX boards, featuring Cortex-M7 CPUs. As a followup to this PR, I will write a tutorial for tuning conv2d.

Here are the most notable changes:

  • All components in the µTVM infra are now parameterized by the word size of the target device.
  • Device configuration has been expanded to include the memory layout, word size, and thumb mode indicator of the device.
  • There is now a micro.device Python namespace featuring a global registry of all supported devices. The registry is indexed by device ID (e.g., host, riscv_spike, or arm.stm32f746xx). and maps to dictionaries containing two functions: create_micro_lib (for creating libraries specific to that device) and default_config (for generating default device-specific config).
  • The µTVM runtime API has been expanded to include timing functions, where each implementation is device-specific.`
  • There is new a src/runtime/micro/device folder which mirrors the structure of the micro.device folder and includes device initialization and timer implementations for each device.
  • RPC sessions will now use the MicroTimeEvaluator when possible, to make use of cycle-accurate timings available on microcontrollers, instead of using wall clock time (which would include communication overhead).

Many thanks to @tqchen for discussing the design with me!

CC @u99127 @ajtulloch @jwfromm

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick comments

python/tvm/contrib/binutil.py Show resolved Hide resolved
python/tvm/contrib/binutil.py Outdated Show resolved Hide resolved
python/tvm/contrib/binutil.py Outdated Show resolved Hide resolved
python/tvm/micro/base.py Show resolved Hide resolved
python/tvm/micro/device/arm/stm32f746xx.py Show resolved Hide resolved
@@ -0,0 +1,102 @@
#ifdef __cplusplus
extern "C" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.cc o be consistent with the rest part of the stack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to be compiled and loaded on the device. The #ifdef __cplusplus is just in case a C++ compiler is run over it.

python/tvm/micro/rpc_server.py Outdated Show resolved Hide resolved
src/runtime/micro/device/arm/stm32f746xx/utvm_timer.c Outdated Show resolved Hide resolved
src/runtime/micro/openocd_low_level_device.cc Show resolved Hide resolved
@weberlo
Copy link
Contributor Author

weberlo commented Nov 8, 2019

@tqchen It looks like the CI doesn't allow assembly---namely, utvm_init.s. That file is required to enable the Cortex-M7 FPU and stack pointer. Can you add it as an exception? We might also want an assembly file whitelist for the entire src/runtime/micro/device directory.

@weberlo weberlo force-pushed the add-arm-autotvm-utvm branch from 1ca1cda to 2279ce9 Compare November 10, 2019 06:28
@tqchen
Copy link
Member

tqchen commented Nov 11, 2019

please rebase against the master due to #4286 also fix the ci error

@weberlo weberlo force-pushed the add-arm-autotvm-utvm branch from 5b86677 to babeb97 Compare November 12, 2019 17:49
@weberlo weberlo force-pushed the add-arm-autotvm-utvm branch from cd15454 to c9629a8 Compare November 19, 2019 20:06
python/tvm/contrib/binutil.py Outdated Show resolved Hide resolved
Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the commented bits part of BinaryContents ?

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent a couple of hours this evening reviewing this and I have some initial minor corrections for what struck my eye when reading this.

It would be good to consider the following points for future direction. I am not asking for this to be fixed as part of this PR.

In the Arm architecture - there is an architecture level, there are optional features in the architecture, usually an FPU. There are multiple implementations for a particular architecture level and finally multiple devices for each of those CPU implementations. There are many differences between the multiple devices but in the context of uTVM the differences we need to start by worrying about between the devices is really the memory maps and what optional features of the ISA are implemented in that device.

Now, why is this important ? In this world there are multiple implementations with Cortex-M7 with an FP5-SP-D16 FPU init, but the memory maps might well be different between different boards from different manufacturers, thus having easy ways of describing only those differences in a first class way are useful.

regards,
Ramana

python/tvm/micro/base.py Outdated Show resolved Hide resolved
python/tvm/micro/device/arm/stm32f746xx.py Show resolved Hide resolved
python/tvm/micro/device/arm/stm32f746xx.py Show resolved Hide resolved
python/tvm/micro/device/arm/stm32f746xx.py Show resolved Hide resolved
src/runtime/micro/micro_session.h Show resolved Hide resolved
src/runtime/micro/micro_session.h Outdated Show resolved Hide resolved
@weberlo
Copy link
Contributor Author

weberlo commented Nov 23, 2019

Now, why is this important ? In this world there are multiple implementations with Cortex-M7 with an FP5-SP-D16 FPU init, but the memory maps might well be different between different boards from different manufacturers, thus having easy ways of describing only those differences in a first class way are useful.

@u99127 This is good to know. We should evolve the design to accommodate these instances as they crop up.

@weberlo weberlo force-pushed the add-arm-autotvm-utvm branch from 4674aa0 to 9d77f82 Compare November 24, 2019 23:14
@weberlo
Copy link
Contributor Author

weberlo commented Nov 25, 2019

One last change incoming. Forgot to move create_micro_mod out of micro.Session into a static method.

@tqchen
Copy link
Member

tqchen commented Dec 1, 2019

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the time it's taken.

While this feels like a very initial integration, I think the Arm backend parts should certainly be made more modular to make board addition simpler and the overflow counting for performance counters needs to be handled in the future.

regards
Ramana

@tqchen tqchen merged commit 47c870a into apache:master Dec 2, 2019
@tqchen
Copy link
Member

tqchen commented Dec 2, 2019

Thanks @u99127 @weberlo !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants