Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactoring] BSD-License-compliant flashloader rewrite #932

Merged
merged 26 commits into from
May 6, 2020
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
abaa8f5
Add cleanroom document --rebased
chenguokai Apr 21, 2020
44e2a4f
rewrite as clean room doc
hsupu Apr 24, 2020
eb93492
Merge branch 'develop' of https://github.com/hsupu/stlink into develop
chenguokai Apr 24, 2020
1451880
bugfix
hsupu Apr 24, 2020
f8a2927
Merge branch 'develop' of https://github.com/hsupu/stlink into develop
chenguokai Apr 24, 2020
4308543
update
hsupu Apr 24, 2020
dfac448
Merge branch 'develop' of https://github.com/chenguokai/stlink into d…
chenguokai Apr 24, 2020
8aaf95a
rewrite flashloaders as clean room doc
hsupu Apr 24, 2020
11bb057
Fix merge conflicts for cleanroom
chenguokai Apr 24, 2020
8bbedab
fix typo
hsupu Apr 24, 2020
e30dcb4
Merge branch 'develop' of https://github.com/hsupu/stlink into develop
chenguokai Apr 24, 2020
15e2e1d
fix align
hsupu Apr 25, 2020
43ddace
Merge branch 'develop' of https://github.com/hsupu/stlink into develop
chenguokai Apr 25, 2020
36bb77d
Cleanroom for flashloaders done
chenguokai Apr 25, 2020
489a37e
Document error fix
chenguokai Apr 25, 2020
9096984
fix stm32f0 loop condition
hsupu Apr 25, 2020
fd89381
Fix branch logic error in stm32f0.s
chenguokai Apr 25, 2020
b097173
Sync flashloader.c with stm32f0.s
chenguokai Apr 25, 2020
8b77d02
fix: stm32-lv r2(4) and copy(1) has different data unit size
hsupu Apr 25, 2020
c62a781
Fix word count issues for stm32f4lv and stm32f7lv
chenguokai Apr 25, 2020
09b40ca
update
hsupu Apr 25, 2020
a4fec73
update
hsupu Apr 25, 2020
090c4d3
Fix size issues in stm32f4lv.s and stm32f7lv.s
chenguokai Apr 25, 2020
65ca384
Sync flashloader.c with commit a4fec73a27516f787fa0853b1fab9ecaa7225f61
chenguokai Apr 25, 2020
6a768d3
Add a documentation about flashloaders and adjust clean room document…
chenguokai Apr 27, 2020
06a5d71
Remove all 'my' in tag name
chenguokai Apr 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions doc/flashloaders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Flashloaders

## What do flashloaders do

The on-chip FLASH of STM32 needs to be written once a byte/half word/word/double word, which would lead to a unbearably long flashing time if the process is solely done by `stlink` from the host side. Flashloaders are introduced to cooperate with `stlink` so that the flashing process is divided into two stages. In the first stage, `stlink` loads flashloaders and flash data to SRAM where busy check is not applied. In the second stage, flashloaders are kick-started, writing data from SRAM to FLASH, where a busy check is applied. Thus the write-check\_if\_busy cycle of flashing is done solely by STM32 chip, which saves considerable time in communications between `stlink` and STM32.

As SRAM is usually less in size than FLASH, `stlink` only flashes one page (may be less if SRAM is insufficient) at a time. The whole flashing process may consist of server launches of flashloaders.

## The flahsing process

1. `st-flash` loads compiled binary of corresponding flashloader to SRAM by calling `stlink_flash_loader_init` in `src/flash_loader.c`
2. `st-flash` erases corresponding flash page by calling `stlink_erase_flash_page` in `common.c`.
3. `st-flash` calls `stlink_flash_loader_run` in `flash_loader.c`. In this function
+ buffer of one flash page is written to SRAM following the flashloader
+ the buffer start address (in SRAM) is written to register `r0`
+ the target start address (in FLASH, page aligned) is written to register `r1`
+ the buffer size is written to register `r2`
+ the start address (for now 0x20000000) of flash loader is written to `r15` (`pc`)
+ After that, launching the flashloader and waiting for a halted core (triggered by our flashloader) and confirming that flashing is completed with a zeroed `r2`
4. flashloader part: much like a `memcpy` with busy check
+ copy a single unit of data from SRAM to FLASH
+ (for most devices) wait until flash is not busy
+ trigger a breakpoint which halts the core when finished

## Constraints

Thus for developers who want to modify flashloaders, the following constraints should be satisfied.

* only thumb-1 (for stm32f0 etc) or (thumb-1 and thumb-2) (for stm32f1 etc) instructions can be used, no ARM instructions.
* no stack, since it may overwrite buffer data.
* for most devices, after writing a single unit data, wait until FLASH is not busy.
* for some devices, check if there are any errors during flashing process.
* respect unit size of a single copy.
* after flashing, trigger a breakpint to halt the core.
* a sucessful run ends with `r2` set to zero when halted.
* be sure that flashloaders are at least be capable of running at 0x20000000 (the base address of SRAM)


For devices that need to wait until the flash is not busy, check FLASH_SR_BUSY bit. For devices that need to check if there is any errors during flash, check FLASH\_SR\_(X)ERR where `X` can be any error state

FLASH_SR related offset and copy unit size may be found in ST official reference manuals and/or some header files in other open source projects. Clean room document provides some of them.


## Debug tricks

If you find some flashloaders to be broken or you need to write a new flashloader for new devices, the following tricks may help.

1. Modify `WAIT_ROUNDS` marco to a bigger value so that you will have time to kill st-flash when it is waiting for a halted core.
2. run `st-flash` and kill it after the flashloader is loaded to SRAM
3. launch `st-util` and `gdb`/`lldb`
4. set a breakpoint at the base address of SRAM
5. jump to the base address and start your debug

The tricks work because by this means, most work (flash unlock, flash erase, load flashloader to SRAM) would have been done automatically, saving time to construct a debug environment.
chenguokai marked this conversation as resolved.
Show resolved Hide resolved
38 changes: 38 additions & 0 deletions flashloaders/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Note that according to the original GPLed code, compiling is noted to be
# as simple as gcc -c, this fails with my tests where this will lead to a wrong
# address read by the program.
# This makefile will save your time from dealing with compile errors
# Adjust CC if needed

CC = /opt/local/gcc-arm-none-eabi-8-2018-q4-major/bin/arm-none-eabi-gcc

CFLAGS_thumb1 = -mcpu=Cortex-M0 -Tlinker.ld -ffreestanding -nostdlib
CFLAGS_thumb2 = -mcpu=Cortex-M3 -Tlinker.ld -ffreestanding -nostdlib

all: stm32vl.o stm32f0.o stm32l.o stm32f4.o stm32f4_lv.o stm32l4.o stm32f7.o stm32f7_lv.o

stm32vl.o: stm32f0.s
$(CC) stm32f0.s $(CFLAGS_thumb2) -o stm32vl.o
stm32f0.o: stm32f0.s
$(CC) stm32f0.s $(CFLAGS_thumb1) -o stm32f0.o
stm32l.o: stm32lx.s
$(CC) stm32lx.s $(CFLAGS_thumb2) -o stm32l.o
stm32f4.o: stm32f4.s
$(CC) stm32f4.s $(CFLAGS_thumb2) -o stm32f4.o
stm32f4_lv.o: stm32f4lv.s
$(CC) stm32f4lv.s $(CFLAGS_thumb2) -o stm32f4_lv.o
stm32l4.o: stm32l4.s
$(CC) stm32l4.s $(CFLAGS_thumb2) -o stm32l4.o
stm32f7.o: stm32f7.s
$(CC) stm32f7.s $(CFLAGS_thumb2) -o stm32f7.o
stm32f7_lv.o: stm32f7lv.s
$(CC) stm32f7lv.s $(CFLAGS_thumb2) -o stm32f7_lv.o

clean:
rm *.o






233 changes: 233 additions & 0 deletions flashloaders/cleanroom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
Original Chinese version can be found below.

# Clean Room Documentation English Version

Code is situated in section `.text`

Shall add a compile directive at the head: `.syntax unified`

**Calling convention**:

All parameters would be passed over registers

`r0`: the base address of the copy source
`r1`: the base address of the copy destination
`r2`: the total word (4 bytes) count to be copied (with expeptions)

**What the program is expected to do**:

Copy data from source to destination, after which trigger a breakpint to exit. Before exit, `r2` must be cleared to zero to indicate that the copy is done.

**Limitation**: No stack operations are permitted. Registers ranging from `r3` to `r12` are free to use. Note that `r13` is `sp`(stack pointer), `r14` is `lr`(commonly used to store jump address), `r15` is `pc`(program counter).

**Requirement**: After every single copy, wait until the flash finishes. The detailed single copy length and the way to check can be found below. Address of `flash_base` shall be two-bytes aligned.

## stm32f0.s

**Exception**: `r2` stores the total half word (2 bytes) count to be copied

`flash_base`: 0x40022000

`FLASH_CR`: offset from `flash_base` is 16

`FLASH_SR`: offset from `flash_base` is 12

**Reference**: [https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f0.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f0.h)
[https://www.st.com/resource/en/reference_manual/dm00031936-stm32f0x1stm32f0x2stm32f0x8-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00031936-stm32f0x1stm32f0x2stm32f0x8-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

**Special requirements**:

Before every copy, read a word from FLASH_CR, set the lowest bit to 1 and write back. Copy one half word each time.

How to wait for the write process: read a word from FLASH_SR, loop until the content is not 1. After that, check FLASH_SR, proceed if the content is 4, otherwise exit.

Exit: after the copying process and before triggering the breakpoint, clear the lowest bit in FLASH_CR.

## stm32f4.s

`flash_base`: 0x40023c00

`FLASH_SR`: offset from `flash_base` is 0xe (14)

**Reference**: [https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h)
[https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf](https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf)


**Special requirements**:

Copy one word each time.
How to wait for the write process: read a half word from FLASH_SR, loop until the content is not 1.

## stm32f4lv.s

`flash_base`: 0x40023c00

`FLASH_SR`: offset from `flash_base` is 0xe (14)

**Reference**: [https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h)
[https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf](https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf)

**Special Requirements**:

Copy one byte each time.

How to wait from the write process: read a half word from FLASH_SR, loop until the content is not 1.

## stm32f7.s

**Reference**: [https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h)
[https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

Mostly same with `stm32f4.s`. Require establishing a memory barrier after every copy and before checking for finished writing by `dsb sy`

## stm32f7lv.s

**Reference**: [https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h)
[https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

**Special Requirements**:

Mostly same with `stm32f7.s`. Copy one byte each time.

## stm32l0x.s

**Special Requirements**:

Copy one word each time. No wait for write.

## stm32l4.s

**Exception**: r2 stores the double word count to be copied.

`flash_base`: 0x40022000
`FLASH_BSY`: offset from `flash_base` is 0x12

**Reference**: [https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32l4.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32l4.h)
[https://www.st.com/resource/en/reference_manual/dm00310109-stm32l4-series-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00310109-stm32l4-series-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

**Special Requirements**:

Copy one double word each time (More than one registers are allowed).

How to wait for the write process: read a half word from `FLASH_BSY`, loop until the lowest bit turns non-1.

## stm32lx.s

Same with stm32l0x.s.


# 净室工程文档-原始中文版

代码位于的section:`.text`
编译制导添加`.syntax unified`

传入参数约定:

参数全部通过寄存器传递

`r0`: 拷贝源点起始地址
`r1`: 拷贝终点起始地址
`r2`: 拷贝word(4字节)数(存在例外)

程序功能:将数据从源点拷贝到终点,在拷贝完毕后触发断点以结束执行,结束时`r2`值需清零表明传输完毕。

限制:不可使用栈,可自由使用的临时寄存器为`R3`到`R12`。`R13`为`sp`(stack pointer),`R14`为lr(一般用于储存跳转地址),`R15`为`pc`(program counter)。

要求:每完成一次拷贝,需等待flash完成写入,单次拷贝宽度、检查写入完成的方式见每个文件的具体要求。

特殊地址`flash_base`存放地址需2字节对齐。

## stm32f0.s

例外:`r2`:拷贝half word(2字节)数

特殊地址定义:`flash_base`:定义为0x40022000

`FLASH_CR`: 相对`flash_base`的offset为16

`FLASH_SR`: 相对`flash_base`的offset为12

参考:[https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f0.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f0.h)
[https://www.st.com/resource/en/reference_manual/dm00031936-stm32f0x1stm32f0x2stm32f0x8-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00031936-stm32f0x1stm32f0x2stm32f0x8-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

特殊要求:
每次拷贝开始前需要读出FLASH_CR处的4字节内容,将其最低bit设置为1,写回FLASH_CR。

每次写入数据宽度为2字节(半字)。

每完成一次写入,需等待flash完成写入,检查方式为读取FLASH_SR处4字节内容,若取值为1,则说明写入尚未完成,需继续轮询等待;否则需要检查FLASH_SR处值是否为4,若非4,则应直接准备退出。

退出:全部拷贝执行完毕后触发断点前,将FLASH_CR处4字节内容最低bit清为0,写回FLASH_CR。



## stm32f4.s

特殊地址定义: `flash_base`:定义为0x40023c00

`FLASH_SR`:相对flash_base的offset为0xe(14)

参考:[https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h)
[https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf](https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf)

特殊要求:

每次写入的数据宽度为4字节(字)。

每完成一次写入,需等待flash完成写入,检查方式为读取FLASH_SR处2字节内容,若取值为1,则说明写入尚未完成,需继续轮询等待。

## stm32f4lv.s

特殊地址定义:`flash_base`:定义为0x40023c00

`FLASH_SR`:相对`flash_base`的offset为0xe (14)

参考:[https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f4.h)
[https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf](https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf)

特殊要求:

每次写入的数据宽度为1字节(1/4字)。

每完成一次写入,需等待flash完成写入,检查方式为读取FLASH_SR处2字节内容,若取值为1,则说明写入尚未完成,需继续轮询等待。

## stm32f7.s

参考:[https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h)
[https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

要求同stm32f4.s,额外要求在每次拷贝执行完毕、flash写入成功检测前,执行`dsb sy`指令以建立内存屏障。


## stm32f7lv.s

参考:[https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32f7.h)
[https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00124865-stm32f75xxx-and-stm32f74xxx-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)
要求基本同stm32f7.s,差异要求为每次写入的数据宽度为1字节(1/4字)。

## stm32l0x.s

特殊要求:

每次写入的数据宽度为4字节(字)

无需实现检查flash写入完成功能

## stm32l4.s

例外:`r2`: 拷贝双字(8字节)数

特殊地址定义:`flash_base`: 0x40022000

`FLASH_BSY`:相对flash_base的offset为0x12

参考:[https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32l4.h](https://chromium.googlesource.com/chromiumos/platform/ec/+/master/chip/stm32/registers-stm32l4.h)
[https://www.st.com/resource/en/reference_manual/dm00310109-stm32l4-series-advanced-armbased-32bit-mcus-stmicroelectronics.pdf](https://www.st.com/resource/en/reference_manual/dm00310109-stm32l4-series-advanced-armbased-32bit-mcus-stmicroelectronics.pdf)

拷贝方式:一次性拷贝连续的8个字节(使用两个连续寄存器作中转)并写入

每完成一次写入,需等待flash完成写入,检查方式为读取FLASH_BSY处半字(2字节),若其最低位非1,可继续拷贝。

## stm32lx.s

要求与stm32l0x.s相同
9 changes: 9 additions & 0 deletions flashloaders/linker.ld
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/*. Entry Point *./
ENTRY( mycopy )


/*. Specify the memory areas .*/
MEMORY
{
RAM ( xrw) : ORIGIN = 0x20000000 , LENGTH = 64K
}
Loading