Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/suit: initial support for SUIT firmware updates #11818

Merged
merged 5 commits into from
Oct 10, 2019

Conversation

kaspar030
Copy link
Contributor

@kaspar030 kaspar030 commented Jul 8, 2019

Contribution description

This PR contains initial support for updating RIOT using an implementation of SUIT draft version 4.

This currently only supports a limited use case: one component, two slots (one active one passive). Basically, it allows SUIT compliant infrastructure and tooling to update a RIOT node.

Note: The implementation should not yet be considered production ready.

The implementation is based on a draft spec that will change in the near future. The implementation has also not yet gone through any security audit.
This should still be usable for less security sensitive deployments, and by getting this into master, we hope to get more exposure and feedback going towards production readiness.

The RIOT-fp project commits to maintaining and improving this implementation.

The PR is grouped in three commits:

  • one commit adding the sys/suit module, including manifest handling and a CoAP based update service
  • one commit adding necessary tooling (key- and manifest-generation) to dist/tools/suit
  • one commit adding examples/suit_update, show-casing SUIT usage in addition to providing an automated test case.

The commit history looks quite large due to the large list of dependencies. This PR should serve as tracking PR as long as there are open dependencies.

Testing procedure

The PR ships with an example application in examples/suit_update, which includes testing instructions using an automated test, which tests updating each slot over ethos.
The test is also run by the CI.

Apart from verifying that updating works for a given node, it would be very helpful to get feedback on the documentation and whether it describes the necessary steps in your setup.

Issues/PRs references

Depends on #11801, #11802, #11803, #11805, #11816, #11707, #11690, #11697, and riotdocker #73

@kaspar030 kaspar030 added Area: network Area: Networking Type: new feature The issue requests / The PR implemements a new feature for RIOT State: waiting for other PR State: The PR requires another PR to be merged first CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR labels Jul 8, 2019
@kaspar030 kaspar030 requested a review from bergzand July 8, 2019 13:24
@kaspar030
Copy link
Contributor Author

@kaspar030
Copy link
Contributor Author

CI is missing some python libraries for manifest generation... RIOT-OS/riotdocker#73

@kaspar030
Copy link
Contributor Author

@kaspar030
Copy link
Contributor Author

}

#endif
if (res == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing #11805 I realized res is uninitialized if MODULE_SUIT_V4 is not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still needs addressing

@kaspar030
Copy link
Contributor Author

@kaspar030 kaspar030 force-pushed the suit-pr branch 2 times, most recently from 80256bf to 57eeffa Compare July 12, 2019 09:21
@kaspar030
Copy link
Contributor Author

Found an issue with the key generation targets not being properly serialized before "clean".
Split out a new "CLEAN" variable into #11829.

@kaspar030 kaspar030 removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jul 12, 2019
@kaspar030
Copy link
Contributor Author

The dependency graph is getting smaller! 😄

@kaspar030
Copy link
Contributor Author

Copy link
Contributor

@jcarrano jcarrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has way too many lines with no tests. Please provide tests.

@fjmolinas
Copy link
Contributor

fjmolinas commented Oct 9, 2019

There's an issue with nanocoap_server choosing the wrong source address for a coap reply. IMO this is unrelated, but causes the test to fail. We're investigating.

I had pointed out the initial issue IRL to @kaspar030

To be clear what was happening is that aiocoap was sending a trigger to the devices EUI64 link local address, so [fe80::20d:46ff:fe3b:adf0%riot0] but for some reason the ACK received by aiocoap did not come from the EUI64 link local address but from [fe80::2%riot0] and there was no more observed traffic after that, the test stalled.

aiocoap-client -m POST "coap://[fe80::20d:46ff:fe3b:adf0%riot0]/suit/trigger" \
	--payload "coap://[fd00:dead:beef::1]/fw/samr21-xpro/$(basename /home/francisco/workspace/RIOT/examples/suit_update/bin/samr21-xpro/suit_update-riot.suitv4_signed.latest.bin)" && \
	echo "Triggered [fe80::20d:46ff:fe3b:adf0%riot0] to update."
WARNING:coap:Received Type.ACK from <UDP6EndpointAddress [fe80::2%riot0]:5683 with local address>, but could not match it to a running exchange

image

All ACK from the device are with [fe80::2%riot0], so aiocoap keeps POSTS to the original address expecting a response from [fe80::20d:46ff:fe3b:adf0%riot0].

At first it answers correctly to the block requests from [fe80::20d:46ff:fe3b:adf0%riot0], but at some point it answers with destination unreachable (port unreachable)

image

This issue appears sometimes after flashing, setting up the network and launching the tests. When it appears setting-up the network again multiple times ends up fixing it.

I would say this would be fixed by #12404. EDIT: will test

@fjmolinas
Copy link
Contributor

fjmolinas commented Oct 9, 2019

I'm having issues in finding a reliable way to reproduce this issue. I've tried testing a bunch on time rebased on #12404, and I haven't been able to reproduce the bug.

It shows up (edit: in this PR withouth #12404) from time to time when executing the setup for the automatic tests and launching. (I have to re-do the setup each time for it to work)

sudo dist/tools/ethos/setup_network.sh riot0 2001:db8::/64
BOARD=samr21-xpro make -C examples/suit_update test

BOARD=samr21-xpro make -C examples/suit_update test

@kaspar030
Copy link
Contributor Author

I've pushed a commit to disable testing on CI until the source address selection is resolved.

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my request changes have been addressed and I have no issues with the code (I trust @aabadie review of the python code).

There is still work to be done, parts of the standard to be supported and the tests can be improved. But there are already PR's and current work to address this. The code is completely isolated so it should not impact any other feature, so I find it safe to merge this during soft-feature-freeze. This will also make it easy to improve and change the code base in the feature. This is a good initial support for suit and ota.

I have tested this code extensively: locally, over ethos and in wireless setups, with nodes continuously updating. There are some issues regarding network setup, explicitly when multiple addresses are present (link-local or global), but this is an issue with the source address selection, so unrelated (although it impacts this PR), I have stated the details of this issue and #12404, #12408 partially address this issue. I agree with disabling it in murdock since we are not sure it can be reliably ran in the CI. Running it locally still allows verifying the code and its functionality. IMO this is not a reason to block this PR.

The documentation is detailed in explaining the usage, and has been tested multiple times, and improved accordingly. The feature is tagged as @experimental, so should be treated as such by users.

What is your stance here @aabadie?

Copy link
Contributor

@aabadie aabadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your stance here @aabadie?

I think it's time to move forward with secure ota and suit. I know @fjmolinas tested this work extensively and I also did it a bit. This is working well.
The python code is good enough to be merged.
There's room for improvement, of course, but it can and will come in follow-up PRs, for sure.

ACK!

@miri64
Copy link
Member

miri64 commented Oct 9, 2019

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 9, 2019
@kaspar030
Copy link
Contributor Author

Is the worker out of disk space, or what happened here?

Seems like it ran out of tmpfs space.

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR labels Oct 9, 2019
@fjmolinas
Copy link
Contributor

fjmolinas commented Oct 10, 2019

@kaspar030 go ahead and click the button :) !

@kaspar030 kaspar030 merged commit 5ab8d92 into RIOT-OS:master Oct 10, 2019
@kaspar030
Copy link
Contributor Author

Congrats everyone! 😄

@miri64
Copy link
Member

miri64 commented Oct 10, 2019

🎉

@kaspar030 kaspar030 deleted the suit-pr branch October 10, 2019 08:41
@bergzand
Copy link
Member

🎉

@emmanuelsearch
Copy link
Member

Yay! Congrats to all involved! 🎉

if (size >= 0) {
LOG_INFO("suit_coap: got manifest with size %u\n", (unsigned)size);

riotboot_flashwrite_t writer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains a uint8_t flashpage_buf[FLASHPAGE_SIZE]; with FLASHPAGE_SIZE being up to 8k on some platforms - is it a good idea to put that on the stack?

#endif

#ifndef SUIT_MANIFEST_BUFSIZE
#define SUIT_MANIFEST_BUFSIZE 640
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does that number come from?

Comment on lines +370 to +371
riotboot_hdr_print(hdr);
xtimer_sleep(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? 😉

}
}

ssize_t suit_coap_get_blockwise_url_buf(const char *url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in any header - should be static?

return (res < 0) ? (ssize_t)res : (ssize_t)_buf.offset;
}

static void _suit_handle_url(const char *url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we export that so it can be used without the suit thread?

coap_blockwise_cb_t callback, void *arg)
{
/* mmmmh dynamically sized array */
uint8_t buf[64 + (0x1 << (blksize + 4))];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the added 64?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: OTA Area: Over-the-air updates CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.