Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avbroot 2.0: Rewrite in Rust #130

Merged
merged 8 commits into from
Sep 1, 2023
Merged

avbroot 2.0: Rewrite in Rust #130

merged 8 commits into from
Sep 1, 2023

Conversation

chenxiaolong
Copy link
Owner

Why?

It was always my intention to write avbroot in a compiled language. Python was a stop-gap solution since it was possible to use the various tools and parsers from AOSP to make the initial prototyping and implementation easier. However, doing so required a whole lot of hacks since nearly all of the Python modules we use were intended to be used as executables, not libraries, and they were definitely not meant to be used outside of AOSP's code base.

Although the dependencies on AOSP code have been reduced over time, working on the Python code is still frustrating. The majority of the modules we use from both the standard library and external dependencies are lacking type annotations. All of the Python language servers and type checker tools I've used choked on them. There have been serveral avbroot bugs in the past that wouldn't have happened with any statically typed language.

The catalyst for me working on this recently was dealing with some python-protobuf versions that wouldn't work with AOSP's pregenerated protobuf bindings. When parsing protobuf messages, it would fail with obscure runtime type errors. I need my projects to not feel frustrating or else I'll just get burnt out.

Hence, the Rust rewrite. With fewer hacks this time! avbroot no longer has any dependencies on external tools like openssl. I'll be providing precompiled binaries for the three major desktop OS's, built by GitHub Actions. avbroot will also be versioned now, starting at 2.0.0.

Whats new?

  • A new avbroot ota verify subcommand has been added to check that all OTA and AVB related components have been properly hashed and signed. This works for all OTA images, including stock ones.
  • A couple new avbroot avb subcommands have been added for dumping vbmeta header/footer information and verifying AVB signatures. These are roughly equivalent to avbtool's info_image and verify_image subcommands, though avbroot is about an order of magnitude faster than the latter.
  • A new set of avbroot boot subcommands have been added for packing and unpacking boot images. It supports Android v0-v4 images and vendor v3-v4 images. Repacking is lossless even when using deprecated fields, like the boot image v4 VTS signature.
  • A new avbroot ramdisk subcommand has been added for inspecting the CPIO structure of ramdisks.
  • A new set of avbroot key subcommands have been added for generating signing keys so that it's no longer necessary to install openssl and avbtool (though of course, keys generated by other tools remain fully compatible).
  • Since avbroot has a ton of CLI options, a new avbroot completion subcommand has been added for generating tab-completion configs for various shells (eg. bash, zsh, fish, powershell).

What was removed?

Nothing :) The patch and extract subcommands have been moved under avbroot ota and the magisk-info subcommand has been moved under avbroot boot, but there are compatibility shims in place to keep all the old commands working.

The command-line interface will remain backwards compatible for as long as possible, even with new major releases. The Rust API, however, has no backwards compatibility guarantees. I currently don't intend for avbroot's "library" components to be used anywhere outside of Custota and avbroot itself.

Performance

Due to having better access to low-level APIs (especially pread and pwrite), nearly everything that can be multithreaded in avbroot is now multithreaded. In addition, during the patching operation, everything is done entirely in memory without temp files and the maximum memory usage is still about 100MB lower than with the Python implementation.

The new implementation is bottlenecked by how fast a single CPU core can calculate 3 SHA256 hashes of overlapping regions spanning the majority of the OTA file. About 90% of the CPU time is spent calculating SHA256 hashes and another 5% or so performing XZ-compression.

Some numbers:

  • Patching should take roughly 40%-70% of the time it took before.
  • Extracting with --all should take roughly 10%-30% of the time it took before.

Folks with x86_64 CPUs supporting SHA-NI extensions (eg. Intel 11th gen and newer) should see even bigger improvements.

Reproducibility

The new implementation's output files are bit-for-bit identical when the inputs are the same. However, they do not exactly match what the Python implementation produced.

  • The zip entries, aside from metadata and metadata.pb, are written in sorted order.
  • All zip entries are stored without compression.
  • All zip entries are stored without additional metadata (eg. modification timestamp).
  • The OTA certificate, both in the OTA zip and in the recovery ramdisk's otacerts.zip, goes through deserialization + serialization before being written. Text in the certificate file before the header and after the footer will be stripped out.
  • The protobuf structures (payload header and OTA metadata) are serialized differently. Protobuf has more than one way to encode the same messages "on the wire". The Rust quick_protobuf library serializes messages a bit differently than python-protobuf, but the outputs are mutually compatible.
  • XZ compression of modified partition images in the payload is now done at compression level 0 instead of 6. This reduces the patching time by several seconds at the cost of a couple MiB increase in file size.
  • Ramdisks are now compressed with standard LZ4 instead of LZ4HC (high compression mode). For our use case, the difference is <100 KiB, but using standard LZ4 allows us to use a pure-Rust LZ4 library and makes the compression step much faster.
  • Older ramdisks compressed with gzip are slightly different due to a different gzip implementation being used (flate2 vs. zlib). The two implementations structure the gzip frames slightly differently, but the output is identical when decompressed.
  • Magisk's config file in the ramdisk (.backup/.magisk) will have the SHA1 field set to all zeros. This allows avbroot to keep track of less information during patching for better performance. The field is only used for Magisk's uninstall feature, which can't ever be used in a locked bootloader setup anyway.

Misc

While working on the new avbroot ota verify subcommand, I found that the ossi stock image (OnePlus 10 Pro) used in avbroot's tests has an invalid vbmeta hash for the odm partition. I thought it was an avbroot bug, but AOSP's avbtool reports the same invalid hash too. If that image actually boots, then I'm not sure AVB can be trusted on those devices...

@chenxiaolong chenxiaolong self-assigned this Aug 28, 2023
@chenxiaolong
Copy link
Owner Author

This has been extensively tested on my Pixel 7 Pro, both with adb sideload and via Custota.

A couple more things:

  • Windows and macOS are now supported and CI-tested via Github Actions.

  • I'm aware that some folks have used avbroot to make other boot image modifications outside the scope of the project. Thus, I'll keep the Python implementation around in the python branch. It won't receive any further feature updates (eg. compatibility with new Android versions), but if a major bug comes up, I'll make an effort to fix it.

@pixincreate
Copy link

I'm willing to contribute to this since it is in Rust. Can you please put up some roadmap so that I can take some to work on it?

@chenxiaolong
Copy link
Owner Author

I'm willing to contribute to this since it is in Rust. Can you please put up some roadmap so that I can take some to work on it?

I currently don't have a roadmap since it is basically feature complete at this point. If Android 14 brings along some new challenges, I'll open up new issues for them.

chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Aug 29, 2023
chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Aug 29, 2023
chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Aug 29, 2023
@pixincreate
Copy link

I'm willing to contribute to this since it is in Rust. Can you please put up some roadmap so that I can take some to work on it?

I currently don't have a roadmap since it is basically feature complete at this point. If Android 14 brings along some new challenges, I'll open up new issues for them.

I'm happy to contribute, and have already subscribed to the repo

Why?
----

It was always my intention to write avbroot in a compiled language.
Python was a stop-gap solution since it was possible to use the various
tools and parsers from AOSP to make the initial prototyping and
implementation easier. However, doing so required a whole lot of hacks
since nearly all of the Python modules we use were intended to be used
as executables, not libraries, and they were definitely not meant to be
used outside of AOSP's code base.

Although the dependencies on AOSP code have been reduced over time,
working on the Python code is still frustrating. The majority of the
modules we use from both the standard library and external dependencies
are lacking type annotations. All of the Python language servers and
type checker tools I've used choked on them. There have been serveral
avbroot bugs in the past that wouldn't have happened with any
statically typed language.

The catalyst for me working on this recently was dealing with some
python-protobuf versions that wouldn't work with AOSP's pregenerated
protobuf bindings. When parsing protobuf messages, it would fail
with obscure runtime type errors. I need my projects to not feel
frustrating or else I'll just get burnt out.

Hence, the Rust rewrite. With fewer hacks this time! avbroot no longer
has any dependencies on external tools like openssl. I'll be providing
precompiled binaries for the three major desktop OS's, built by GitHub
Actions. avbroot will also be versioned now, starting at 2.0.0.

Whats new?
----------

* A new `avbroot ota verify` subcommand has been added to check that all
  OTA and AVB related components have been properly hashed and signed.
  This works for all OTA images, including stock ones.
* A couple new `avbroot avb` subcommands have been added for dumping
  vbmeta header/footer information and verifying AVB signatures. These
  are roughly equivalent to avbtool's `info_image` and `verify_image`
  subcommands, though avbroot is about an order of magnitude faster than
  the latter.
* A new set of `avbroot boot` subcommands have been added for packing
  and unpacking boot images. It supports Android v0-v4 images and vendor
  v3-v4 images. Repacking is lossless even when using deprecated fields,
  like the boot image v4 VTS signature.
* A new `avbroot ramdisk` subcommand has been added for inspecting
  the CPIO structure of ramdisks.
* A new set of `avbroot key` subcommands have been added for generating
  signing keys so that it's no longer necessary to install openssl and
  avbtool (though of course, keys generated by other tools remain fully
  compatible).
* Since avbroot has a ton of CLI options, a new `avbroot completion`
  subcommand has been added for generating tab-completion configs for
  various shells (eg. bash, zsh, fish, powershell).

What was removed?
-----------------

Nothing :) The `patch` and `extract` subcommands have been moved under
`avbroot ota` and the `magisk-info` subcommand has been moved under
`avbroot boot`, but there are compatibility shims in place to keep all
the old commands working.

The command-line interface will remain backwards compatible for as long
as possible, even with new major releases. The Rust API, however, has no
backwards compatibility guarantees. I currently don't intend for
avbroot's "library" components to be used anywhere outside of Custota
and avbroot itself.

Performance
-----------

Due to having better access to low-level APIs (especially `pread` and
`pwrite`), nearly everything that can be multithreaded in avbroot is now
multithreaded. In addition, during the patching operation, everything
is done entirely in memory without temp files and the maximum memory
usage is still about 100MB lower than with the Python implementation.

The new implementation is bottlenecked by how fast a single CPU core can
calculate 3 SHA256 hashes of overlapping regions spanning the majority
of the OTA file. About 90% of the CPU time is spent calculating SHA256
hashes and another 5% or so performing XZ-compression.

Some numbers:

* Patching should take roughly 40%-70% of the time it took before.
* Extracting with `--all` should take roughly 10%-30% of the time it
  took before.

Folks with x86_64 CPUs supporting SHA-NI extensions (eg. Intel 11th gen
and newer) should see even bigger improvements.

Reproducibility
---------------

The new implementation's output files are bit-for-bit identical when the
inputs are the same. However, they do not exactly match what the Python
implementation produced.

* The zip entries, aside from `metadata` and `metadata.pb`, are written
  in sorted order.
* All zip entries are stored without compression.
* All zip entries are stored without additional metadata (eg.
  modification timestamp).
* The OTA certificate, both in the OTA zip and in the recovery ramdisk's
  `otacerts.zip`, goes through deserialization + serialization before
  being written. Text in the certificate file before the header and
  after the footer will be stripped out.
* The protobuf structures (payload header and OTA metadata) are
  serialized differently. Protobuf has more than one way to encode the
  same messages "on the wire". The Rust quick_protobuf library
  serializes messages a bit differently than python-protobuf, but the
  outputs are mutually compatible.
* XZ compression of modified partition images in the payload is now done
  at compression level 0 instead of 6. This reduces the patching time by
  several seconds at the cost of a couple MiB increase in file size.
* Ramdisks are now compressed with standard LZ4 instead of LZ4HC (high
  compression mode). For our use case, the difference is <100 KiB, but
  using standard LZ4 allows us to use a pure-Rust LZ4 library and makes
  the compression step much faster.
* Older ramdisks compressed with gzip are slightly different due to a
  different gzip implementation being used (flate2 vs. zlib). The two
  implementations structure the gzip frames slightly differently, but
  the output is identical when decompressed.
* Magisk's config file in the ramdisk (`.backup/.magisk`) will have the
  `SHA1` field set to all zeros. This allows avbroot to keep track of
  less information during patching for better performance. The field is
  only used for Magisk's uninstall feature, which can't ever be used in
  a locked bootloader setup anyway.

Misc
----

While working on the new `avbroot ota verify` subcommand, I found that
the `ossi` stock image (OnePlus 10 Pro) used in avbroot's tests has an
invalid vbmeta hash for the `odm` partition. I thought it was an avbroot
bug, but AOSP's avbtool reports the same invalid hash too. If that image
actually boots, then I'm not sure AVB can be trusted on those devices...

Signed-off-by: Andrew Gunnerson <[email protected]>
There is nothing new that requires changes on the avbroot side.

Signed-off-by: Andrew Gunnerson <[email protected]>
chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Aug 29, 2023
chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Aug 29, 2023
There's an upstream bug that causes an infinite loop in the
`write::BzDecoder` destructor if the decoder is fed invalid data. While
this never happens during normal operation, it is possible to run into
this by running `ota extract` against a `--stripped` OTA file.

Signed-off-by: Andrew Gunnerson <[email protected]>
@pascallj
Copy link
Contributor

Just out of curiosity, avbroot used to rely on external tools from AOSP. Now that it isn't using these libraries anymore, did you implement all the relevant functions in Rust yourself?

@chenxiaolong
Copy link
Owner Author

Just out of curiosity, avbroot used to rely on external tools from AOSP. Now that it isn't using these libraries anymore, did you implement all the relevant functions in Rust yourself?

Yup. The biggest part of that was the vbmeta parser: https://github.com/chenxiaolong/avbroot/blob/fb4b93b40376de5a23d2ee9b214d48ccb4f11728/src/format/avb.rs and the generation of the metadata/metadata.pb files: https://github.com/chenxiaolong/avbroot/blob/fb4b93b40376de5a23d2ee9b214d48ccb4f11728/src/format/ota.rs.

Other little things we depended on AOSP for, like computing zip offsets, could be completely dropped because the Rust libraries already provided the information needed.

@pascallj
Copy link
Contributor

Yup. The biggest part of that was the vbmeta parser: https://github.com/chenxiaolong/avbroot/blob/fb4b93b40376de5a23d2ee9b214d48ccb4f11728/src/format/avb.rs and the generation of the metadata/metadata.pb files: https://github.com/chenxiaolong/avbroot/blob/fb4b93b40376de5a23d2ee9b214d48ccb4f11728/src/format/ota.rs.

Must have been a lot of work. Although I do have a background in C, I have no previous experience with Rust so I didn't even attempt to look at the code.

Do you think it is still more beneficial to have this custom implementation even if the formats change in the future?

@chenxiaolong
Copy link
Owner Author

Must have been a lot of work. Although I do have a background in C, I have no previous experience with Rust so I didn't even attempt to look at the code.

Do you think it is still more beneficial to have this custom implementation even if the formats change in the future?

It was a decent amount of work, but mostly because AOSP's documentation is too high level and doesn't have diagrams and explanations about the actual file format.

I think the benefit of having an actual API to work with outweighs the cost of maintaining a custom implementation vs. hacking around avbtool. For AVB 2.0 specifically, the format is conceptually very simple and there's very little that can go wrong with a custom parser. All of the hashing (SHA256) and signing (RSA w/PKCS#1 v1.5 padding) stuff is bog-standard. But to rule out potential issues, I have a collection of small contrived test files with the header fields set to some edge case and those go through a round-trip deserialize + serialize test that checks for bit-for-bit equality.

If a future AVB 3.0 is more complicated, I'll reevaluate. I highly doubt that would be the case (or that AVB 3.0 is even on the horizon). Google seemed to try very hard to keep AVB 2.0 simple and easy to implement (especially since it has to be integrated into the bootloader).


My thoughts on the other formats avbroot uses:

  • Boot images: The format is simple and new updates to the format are done in a backwards compatible way (4 times now already for boot images and once for vendor_boot images). avbroot preserves reserved header fields, so even if I'm slow to implement support for a future format, it's unlikely that anything would break. (Also, I think avbroot's implementation is the only one that can round-trip every version of boot images--even AOSP's mkbootimg can't do that.)
  • signapk-style OTA zip: The format is really awkward, but still easy enough to implement. It's only used in recovery mode for adb sideload. The actual OTA update process, both while booted into Android and in recovery mode, only care about payload.bin. I wouldn't be surprised if Google just removes the zip container completely in the future.
  • payload.bin: Another (really) awkward format with protobuf, overlapping checksum regions, and multiple signatures. We don't really have a choice but to implement this ourselves. AOSP's payload_generator (C++) is unusable as a library.
  • cpio files: The format is so simple that everyone seems to just write their own parsers instead of creating a library :) It isn't even a binary format--all fields are stored as %08x text.
  • zip files: The zip-rs library is good and low-level enough to accomplish everything we need, so no need for custom zip stuff.
  • lz4/gzip/bzip2/xz: No reason for avbroot to ever use a custom parser for these. The official implementations are portable and work very well.

chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Aug 30, 2023
Signed-off-by: Andrew Gunnerson <[email protected]>
@chenxiaolong chenxiaolong merged commit bbe793b into master Sep 1, 2023
22 checks passed
@chenxiaolong chenxiaolong deleted the rust branch September 1, 2023 03:42
chenxiaolong added a commit to chenxiaolong/Custota that referenced this pull request Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants