Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA which survives power failure #905

Open
plerup opened this issue Oct 18, 2015 · 57 comments · May be fixed by #6538
Open

OTA which survives power failure #905

plerup opened this issue Oct 18, 2015 · 57 comments · May be fixed by #6538

Comments

@plerup
Copy link
Contributor

plerup commented Oct 18, 2015

Maybe I'm missing something but I looks to me like if we have a power failure during the OTA update in eboot.c (ACTION_COPY_RAW) we end up with a non functional system.

Couldn't the data structure "eboot_command" be stored in flash instead and not be reset until ACTION_COPY_RAW operation has been successfully completed?

BR
/Peter

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@plerup plerup changed the title OTA that survives power failure OTA which survives power failure Oct 18, 2015
@igrr
Copy link
Member

igrr commented Oct 18, 2015

It's certainly possible to pass bootloader arguments via flash. This will require allocating an additional sector for eboot_command (like it is done in other bootloaders).
By the way, another point of failure is when the actual image is written. If the power goes out between the moment when the first sector (which contains the bootloader itself) is erased and when it is written, system will not boot.

@igrr igrr added type: enhancement help wanted Help needed from the community component: core labels Oct 18, 2015
@plerup
Copy link
Contributor Author

plerup commented Oct 18, 2015

Couldn't we have (optional) skip of updating the bootloader sector?

@igrr
Copy link
Member

igrr commented Oct 18, 2015

Sure, such option may be implemented in Updater class.

@plerup
Copy link
Contributor Author

plerup commented Oct 18, 2015

I could give this a try. Just not sure how to allocate the extra sector though. Never touched base with the linker before... Have to explore a bit first.

@igrr
Copy link
Member

igrr commented Oct 18, 2015

The tricky part is to allocate this sector when users asks for it, so that flash space is not wasted by default. This could be an additional option in Tools menu or something like that.

@plerup
Copy link
Contributor Author

plerup commented Oct 18, 2015

Maybe optionally using the last 128 bytes in the EEPROM sector

@me-no-dev
Copy link
Collaborator

I'm thinking of something else... since there are sectors at particular locations that are used by the SDK to store settings and such, maybe we can use those? They are a full 4K blocks but data in them is usually not more than 128B, so using the end of one of them can maybe give you the space for storing that data and not mess with EEPROM or the linkers.

@igrr igrr added bounty and removed help wanted Help needed from the community labels Jun 22, 2016
@davisonja
Copy link

I'm a big fan of updates that survive power-failures (in as much as they don't render the device unusable).
I've got a revision of master that stores a command in flash, falling back to RTC if there's no valid flash block available, or there's no command (so we gracefully fall back to current behaviour).

Aesthetically I like the idea of stealing a 4k page at the start of flash - so eboot and its config are all kept together. On a module with 4M this is no real issue, but I realise that this is more of a concern in smaller ones so it does really need to be optional.
Having an optional page at the start makes life more complicated, so for dev I've opted to use a page at the end, modifying the .ld file(s) to reduce the SPIFFS space by 4k. While this works relatively nicely, there appears to be some use of the space at the end of flash - does anyone have a pointer to concrete documentation on which pages are used?
I'm reluctant to hijack parts of pages in case their usage changes in the future, and I've not explored the possibility of storing the data in SPIFFS itself (on the basis that a solution which doesn't require SPIFFS is a better place to start).

Does anyone have any thoughts/ideas on a suitable flash location? Because reducing SPIFFS_END resulted in a page that was overwritten, I'm wondering if shifting the start of SPIFFS is more reliable.

As a side note I'm looking at allocating a full 4k with a view to storing general update config there as well, particularly a key for firmware signing/encryption.

@igrr
Copy link
Member

igrr commented Dec 22, 2016

I think adding an extra 4k sector between eboot and the application is the best option.
SDK 2.0 requires an extra sector reserved for RF calibration data, so there will be some shifting things around anyway.

@davisonja
Copy link

davisonja commented Dec 22, 2016 via email

@igrr
Copy link
Member

igrr commented Dec 22, 2016

Non-optional is fine. I'd say 4k matters only on 512k devices, but these are not very useful for OTA anyway. So this sector can be added to the LD scripts of all flash sizes from 1M and above, by defining some symbols like OTA_info_start and OTA_info_end. For 512k these to symbols can point to the same address, and the Updater figure out that the size of OTA info are is zero, hence RTC memory should be used instead (if anyone tries to use OTA on 512k, that is).

@davisonja
Copy link

davisonja commented Dec 22, 2016 via email

@davisonja
Copy link

In (finally!) getting to tidying this up I've re-encountered the thing that originally led me to putting the page-of-settings somewhere other than the start of flash (by eboot). It also has implications for write-protecting (or, more accurately, not over-writing) eboot itself.
Currently there's a function EspClass::getSketchMD5() which constructs an MD5 using all the Flash from 0x0. A not-uncommon-use for this MD5 is for updates to see if the sketch has changed; if we include a page-of-flash which has ever-changing update commands in it then this MD5 will never match the file to be uploaded (similar potential for false-postive-change exists with mismatching eboot versions).
If the page for settings/commands is placed just before SPIFFS (which translate to just after sketch-space) then we don't have this issue with MD5 generation.

Personally I don't mind overly, I do all my updating through a PHP-based website (uploads straight out of the IDE) which can do any manner of pre-processing before md5 comparisons - but it means that people couldn't do a simple md5sum on the binary as in the example PHP script.
The obvious solution is to just md5 the sketch, rather than everything from 0x0, and update examples etc, but it still amounts to a very breaking change.

I suggest we start with putting the page at the start of SPIFFS and leave everything else as it is to minimise the impact on people who rely on the current MD5 behaviour.

Anyone have any thoughts? @igrr, @plerup ..?

@igrr
Copy link
Member

igrr commented Jan 18, 2017

I see, that makes sense. Moving start of SPIFFS would probably be a headache, but reducing sketch size by 1 sector to make room for this config sector is probably okay.

There's going to be a conflict with the changes I did in the 2.0 SDK branch, as there we need another extra sector for PHY calibration data, so I have done something similar there. Maybe it's going to be easier if I actually finish that PR and merge it, and then you can allocate another sector based on my changes.

@davisonja
Copy link

davisonja commented Jan 18, 2017 via email

@igrr
Copy link
Member

igrr commented Jan 18, 2017

Unfortunately, if we move SPIFFS we also need to adjust boards.txt file as it contains SPIFFS start addresses which are used by FS upload plugin. So that's the two places to make adjustments.
I would prefer to make adjustments in one place only, if possible, this is why I've proposed moving sketch end instead of SPIFFS start.

@davisonja
Copy link

I'd agree with that - a single point is best.

@davisonja
Copy link

Is it a terrible idea to move the SPIFFS declarations out of the .ld files entirely?
With the info already in boards.txt, and available as parameters we can add to the command the IDE generates to 'link it all together' we could end up with the info just stored in one place.
It would reduce the number of .ld files needed too (we still need different ones for the various sizes attached to the irom section definition).

@igrr
Copy link
Member

igrr commented Jan 19, 2017

Yes, we can add a bunch of -Wl,--defsym=foo=val to the ld command line.

But this would, for example, break https://github.com/plerup/makeEspArduino which is something I personally use for all my ESP8266-based projects, so clearly this option isn't something I'll agree upon easily, unless some reasonable workaround is presented (like throwing some grep/sed at boards.txt from the makefile).

Edit: apparently I haven't pulled latest makeEspArduino changes in a while — seems like build variable parsing is now implemented there.

@plerup
Copy link
Contributor Author

plerup commented Jan 19, 2017

Sure is :)

Everything is parsed from boards.txt etc

@davisonja
Copy link

davisonja commented Jan 20, 2017 via email

@igrr
Copy link
Member

igrr commented Jan 20, 2017

Okay @davisonja, if you can take a shot at this that would be awesome. I will then rebase my changes which allocate an extra sector for PHY data (needed for 2.0 SDK support) on top of your work. Thanks.

@davisonja
Copy link

davisonja commented Sep 10, 2019 via email

@davisonja
Copy link

Now on a laptop rather than a phone...

@Androbin those commits look like the original ones that had update instructions being stored in the (volatile) RTC memory.

I have a (now outdated) fork that happily stores the update instructions in a flash page, which thus survives power loss, and will just restart on each power-up until the process is complete.

It's a fairly simple step from there to storing the new image somewhere that's outside the executable area.

The only incomplete part of this solution was reserving the page, which is equally easy, but involved editing all the ld files. The path to doing that was ultimately deemed (back in the day) to be generating the ld files with a script, so my next step is to update that script to include the store-update-details-in-flash page.
Having said that, my work is somewhat out of date, so the actual first step is bringing my fork up to date and see how much remains applicable (or just reapply to the current state if the update looks hard 😁 )

@Androbin
Copy link
Contributor

This issue was probably originally about updating the sketch. But it is equally applicable to the FS. So I've submitted a PR to help with that.

@davisonja davisonja linked a pull request Sep 20, 2019 that will close this issue
@davisonja
Copy link

If the FS needs a chip reset we can easily build that in, but it would still have the "half the potential size" limitation if it had to be downloaded while the system is running...

@davisonja
Copy link

I'm not sure where the right place to ask this is, so I'm going to note it here.

I've been looking at ci failures for #6538 and thus organising the local builds on my dev machine - esptool (as opposed to esptool.py) doesn't seem to be pulled in by get.py.

@Androbin
Copy link
Contributor

I've also had issues with Arduino IDE or plugins thereof being unable to locate the esptool binary.

@davisonja
Copy link

For those following this, and the ones who have comments inflicted on them, I picked up one of the "d1 mini pro" boards with 16MB flash. The default mode of this has a sizable empty allocation in the flash which seems to result in the update clashing with the pages for the ota-commands yielding corruption. Which is curious, but almost certainly not something I ran across when writing the code initially.

@davisonja
Copy link

davisonja commented Oct 1, 2019

Actually unrelated I now believe 🤦‍♂

Does anyone reading this have an opinion about allowing the OTA updates to store the update in, for those flash-configs that have it, the 'empty' section? I'm not sure whether the empty is intended for code to use.

A side-effect of the way I've added the OTA flash parameter block to the boards.txt.py script is that it's automatically including the empty section as part of the area for the update to be stored. This means that it is implicitly using that area. So in my newly acquired 16MB board, in the 14M fs flash layout option, the extra 1M empty area is used for storing the update meaning that there is actually 1732608 bytes showing as free (with a sketch running that is 353008 bytes). Which co-incidentally proves my theory that the existing code will cope with updates being stored anywhere 😆

@devyte
Copy link
Collaborator

devyte commented Oct 1, 2019

@davisonja the area meant for ota is from after the current sketch up to before the FS. That area should be usable both for receiving a new sketch or for receiving a new FS image. An ota image received is supposed to get written in the high part of that empty space. There are no specific areas reserved in that empty area for other things.
So your numbers above look correct to me.

@davisonja
Copy link

Now tested, slightly refined and un-draft'd #6538 is good to go.

It's got the main things I originally wanted to get done back in 2017, so is complete from that perspective. There had been other discussions about additional potential functionality which I'm happy to add. Beyond that, what's next that needs fixed? Life's settled here, so it won't take years 😉

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 3, 2019

What are them ?

There had been other discussions about additional potential functionality which I'm happy to add

@davisonja
Copy link

There's been interest in rboot inspired features, such as multiple images to boot in the past.

I've also wondered about the value in storing ota info in an fs, as an alternative to the dedicated flash area. I've not been through littlefs properly to see how straightforward that is - I had a look at spiffs back in the day.

These are simple things that could be achieved without expanding eboots size greatly. If you've got more flash to use there's other options that would involve making it bigger.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 3, 2019

LittleFS API is FS, the same as SPIFFS's (and now SDFS too), look at this example
I understand though FS code would be also stored in eboot ?

I am not familiar with rboot. Does it require that all images are stored below the first 1MB, or does rboot (re)flashes the chosen image when is it stored beyond the first 1MB ?

@davisonja
Copy link

I was thinking custom, read only support in eboot, rather than pulling in chunks of the main code base. Usually the complexity in filesystems is primarily the allocation, so writing.

From the rboot page, though I've not looked at how applicable their strategy is here:

The ESP8266 can only map 8Mbits (1MB) of flash to memory, but which 8Mbits to map is selectable. This allows individual roms to be up to 1MB in size, so long as they do not straddle an 8Mbit boundary on the flash. This means you could have four 1MB roms or 8 512KB roms on a 32Mbit flash (such as on the ESP-12), or a combination. Note, however, that you could not have, for example, a 512KB rom followed immediately by a 1MB rom because the 2nd rom would then straddle an 8MBit boundary. By default support for using more than the first 8Mbit of the flash is disabled, because it requires several steps to get it working. See below for instructions.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 3, 2019

Then we could have one-step OTAs:
Flash once the other 1MB chunk, and on boot select the most recent one.
(one step flashing, not two-steps like we currently do)

@davisonja
Copy link

davisonja commented Oct 3, 2019 via email

@devyte
Copy link
Collaborator

devyte commented Oct 3, 2019

Correct about fallback and 1-step ota. The downside is that it requires reserving ghe two "slots" in flash, so you can't take advantage of a contiguous empty space area, e. g. for flashing a fs image.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 3, 2019

Default current flash mapping is to use (size)MB-2MB for FS at the end of flash addressing range,
so we have (by default with menus, when flash size >= 4MB) already 2MB for program/OTA at the beginning of the flash addressing range.

@devyte
Copy link
Collaborator

devyte commented Oct 3, 2019

@davisonja said:

what's next that needs fixed

The idea was floated at some point to add support for compressed images, i.e.: receive an image that is compressed.

The following possibilities for implementation have shown up:

  1. Try for full (de)compression, i.e.: deflate, possibly use code from miniz from the flash_stub, which is already known to work on the ESP (because the flash_stub is used by esptool.py). Take the absolute minimum code to deflate and add it to eboot. There is a bit of space left between the end of the eboot bin and the start of the current sketch in the flash. This should be the best solution, because it would allow writing the received image in fully compressed form, then after reboot it could get decompressed on the fly while copying. Also, if the code fits, it would not increase bin size.
    Pros:
  • would allow writing of larger images, thereby relaxing a bit the restriction of bin size for boards with smaller flash size
  • faster transfer speed
  • known algorithm and code
  • wouldn't increase bin size
    Cons:
  • may not fit after eboot due to bin size
  1. Try for some variation of RLE (run-length encoding). Similar idea as above, but with a simpler algorithm instead of miniz. If the above doesn't fit in the available eboot space, maybe this will.
    Pros:
  • would allow writing of larger images, although less so than point 1 above
  • faster transfer speed, although less so than point 1 above
  • wouldn't increase bin size
  • more likely to fit in the available space after eboot
    Cons:
  • less compression than deflate
  • likely custom compression/decompression code
  1. Put all of miniz into Updater. That would allow receiving a fully compressed image, but it would require deflating it before writing it to the empty space area.
    Pros:
  • smaller transfer size (faster transfer speed) like point 1 above
  • the eboot code would remain as-is
    Cons:
  • bin size restrictions remain as current (decompress on receive instead of on copy)
  • bin size increment

Does this sound interesting enough to you to implement?

@davisonja
Copy link

That sounds like fun, @devyte :)

I'll see what sort of sizes miniz turns into. Do we lodge new issues to centre and discussion around, or does that exist elsewhere?

@devyte
Copy link
Collaborator

devyte commented Oct 4, 2019

New issue please.

@earlephilhower
Copy link
Collaborator

MiniZ inflator alone takes 0x14b2 = 5298 bytes pf code when built with -Os.

I just patched in a naive version into the bootloader and updated the makefile to -Os and function/data_sections with gc-sections to drop everything that's not needed.

earle@server:~/Arduino/hardware/esp8266com/esp8266/bootloaders/eboot$ make
../../tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-gcc -std=gnu99 -Os -g -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mno-text-section-literals -I../../tools/sdk/include -mtext-section-literals -mlongcalls -nostdlib -fno-builtin -flto -Wl,-static -g -fdata-sections -ffunction-sections -Wl,--gc-sections   -c -o eboot.o eboot.c
eboot.c: In function 'copy_raw':
eboot.c:119:10: warning: type defaults to 'int' in declaration of 'status' [enabled by default]
     auto status = tinfl_decompress(&inflator, buffer, &in_bytes,
          ^
../../tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-gcc -std=gnu99 -Os -g -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mno-text-section-literals -I../../tools/sdk/include -mtext-section-literals -mlongcalls -nostdlib -fno-builtin -flto -Wl,-static -g -fdata-sections -ffunction-sections -Wl,--gc-sections   -c -o eboot_command.o eboot_command.c
../../tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-gcc -std=gnu99 -Os -g -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mno-text-section-literals -I../../tools/sdk/include -mtext-section-literals -mlongcalls -nostdlib -fno-builtin -flto -Wl,-static -g -fdata-sections -ffunction-sections -Wl,--gc-sections   -c -o miniz.o miniz.c
../../tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-ar cru eboot.a eboot.o eboot_command.o miniz.o
../../tools/xtensa-lx106-elf/bin/xtensa-lx106-elf-gcc -Teboot.ld -Wl,--gc-sections -nostdlib -Wl,--no-check-sections -umain eboot.a -o eboot.elf
earle@server:~/Arduino/hardware/esp8266com/esp8266/bootloaders/eboot$ objdump -t eboot.elf  | grep .text | sort -k1
4010f000 g       *ABS*	00000000 _text_start
4010f000 g       .text	00000000 _stext
4010f000 l    d  .text	00000000 .text
4010f010 g     F .text	00000079 print_version
4010f0a0 g     F .text	0000009b load_app_from_flash_raw
4010f168 g     F .text	00000114 copy_raw
4010f290 g     F .text	000000ea main
4010f380 g     F .text	00000038 crc_update
4010f3b8 g     F .text	00000019 eboot_command_calculate_crc32
4010f3e0 g     F .text	00000049 eboot_command_read
4010f430 g     F .text	00000014 eboot_command_clear
4010f4b8 g     F .text	0000129b tinfl_decompress
40110753 g       *ABS*	00000000 _text_end
40110753 g       .text	00000000 _etext
40110770 l     O .text	00000080 s_dist_base$2064
401107f0 l     O .text	00000080 s_dist_extra$2065
40110870 l     O .text	0000007c s_length_base$2062
401108ec l     O .text	0000007c s_length_extra$2063
40110968 l     O .text	00000013 s_length_dezigzag$2066
4011097c l     O .text	0000000c s_min_table_sizes$2067
40240000 g       *ABS*	00000000 _irom0_text_end
40240000 g       *ABS*	00000000 _irom0_text_start

@davisonja
Copy link

That's a more comprehensive assessment. I commented out the compressor code and built it to get an absolute upper bound, which was about 15k (with eboot, but no optimisations). It seemed like a viable prospect even at that size.

@earlephilhower
Copy link
Collaborator

For eboot, the makefile needs ti go from -O0 to -Os, since at 0 it won't even do simple optimizations.

Then, the problem is that minigz has compress and decompress in one file and the linker will include the entire file even if compression is unused (-flto doesn't work on GCC 4.8). So add in -f(data|function)-sections and -fgc-sections (check exact syntax in our platform.txt) and it'll finally drop the compression bits and you'll be left with the minimum needed.

Granted, I did not test the resulting ELF, just looked at sizes. The eboot.c might have some code that's busted in a way to require no optimization to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants