Add benchmark inference_tract (#37)

* Add benchmark inference_tract * Fix build for mobile_net_v2_onnx Wasm and update tensorflow Wasm file * Embedded the tensorflow model into the Wasm binary * Update readme to reflect new inferencing benchmarks * Update for pr comments
bytecodealliance · Jul 9, 2024 · 9dfac06 · 9dfac06
1 parent a051965
commit 9dfac06
Show file tree

Hide file tree

Showing 36 changed files with 5,142 additions and 28 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -4,9 +4,9 @@ ENV LD_LIBRARY_PATH=/usr/local/lib
 ENV PATH=/usr/local/bin:$PATH
 CMD ["/bin/bash"]
 ENV DEBIAN_FRONTEND="noninteractive" TZ="America"
-ARG RUST_VERSION="nightly-2023-04-01"
+ARG RUST_VERSION="nightly-2024-06-09"
 ARG WASMTIME_REPO="https://github.com/bytecodealliance/wasmtime/"
-ARG WASMTIME_COMMIT="1bfe4b5" # v9.0.1
+ARG WASMTIME_COMMIT="cedf9aa" # v21.0.1
 ARG SIGHTGLASS_REPO="https://github.com/bytecodealliance/sightglass.git"
 ARG SIGHTGLASS_BRANCH="main"
 ARG SIGHTGLASS_COMMIT="e89fce0"

diff --git a/README.md b/README.md
@@ -1,13 +1,13 @@
 # WasmScore
 
 ## Intro
-WasmScore aims to benchmark platform performance when executing WebAssembly outside the browser. It leverages [Sightglass](https://github.com/bytecodealliance/sightglass) to run benchmarks and measure performance and then summarizes these results as both an execution score and an efficiency score. In addition to providing scores for the platform, the benchmark is also a tool capable of executing other tests, suites, or individual benchmarks supported by the driver. WasmScore is work in development.
+WasmScore aims to benchmark platform performance when executing WebAssembly outside the browser. It leverages [Sightglass](https://github.com/bytecodealliance/sightglass) to run benchmarks and measure performance and then summarizes these results as both an execution score and an efficiency score. In addition to providing a general default scores for the platform, the benchmark is also capable of executing other individual or specialized tests, suites, or individual benchmarks supported by the driver. WasmScore development is still at early stages and is work in development.
 
 ## Description
-A basic part of benchmarking is interpreting the results; should you consider the results to be good or bad? To decide, you need a baseline to serve as a point of comparison. For example, that baseline could be a measure of the performance before some code optimization was applied or before some configuration change was made to the runtime. In the case of WasmScore (specifically the wasmscore test) that baseline is the execution of the native code compiled from the same high-level source used to generate the Wasm. In this way the native execution of codes that serves as a comparison point for the Wasm performance also serves as an upper-bound for the performance of WebAssembly. This allows gauging the performance impact when using Wasm instead of a native compile of the same code. It also allows developers to find opportunities to improve compilers, or to improve Wasm runtimes, or improve the Wasm spec, or to suggest other solutions (such as Wasi) to address gaps.
+A basic part of benchmarking is interpreting the results; should you consider the results to be good or bad? To decide, you need a baseline (or goal) to serve as a point of comparison. For example, that baseline could be a measure of the performance before some code optimization was applied or before some configuration change was made to the runtime. In the case of WasmScore (specifically the default score) that baseline is the execution of the native code compiled from the same high-level source used to generate the Wasm. The native execution of codes serves as the expected upper-bound for the performance of WebAssembly. This both informs the performance impact of targeting Wasm instead of native for compiled code and it also allows developers to find opportunities to improve compilers, to improve Wasm runtimes, to improve the Wasm spec, and/or to suggest other solutions (such as Wasi) to address gaps.
 
 ## Benchmarks
-Typically a benchmark reports either the amount of work done over a constant amount of time or it reports the time taken to do a constant amount of work. The benchmarks here all do the later. The initial commit of the benchmarks available are pulled directly from Sightglass. How the benchmarks stored here are built and run do will depend on the external Sightglass revision being used
+Typically a benchmark reports either the amount of work done over a constant amount of time or it reports the time taken to do a constant amount of work. The benchmarks here all do the later. The initial commit of the benchmarks available are pulled directly from Sightglass. How the benchmarks stored here are built and run depends on the Sightglass revision used.
 
 Benchmarks are often categorized based on their purpose and origin. Two example buckets include (1) codes written with the original intent of being user facing and (2) codes written specifically to target benchmarking some important or commonly used code construct or platform component. WasmScore does not aim to favor one of these over the other as both are valuable and relevant in the evaluation of standalone Wasm depending on what you are trying to learn.
 
@@ -22,18 +22,17 @@ WasmScore aims to:
 "wasmscore" is the initial and default test. It includes a mix of benchmarks for testing Wasm performance outside the browser. The test is a collection of several subtests:
 
 ### wasmscore (default):
-- App:  [‘Meshoptimizer’]
-- Core: [‘Ackermann', ‘Ctype', ‘Fibonacci’]
-- Crypto: [‘Base64', ‘Ed25519', ‘Seqhash']
-- AI: (Coming)
+- App:  [‘meshoptimizer’]
+- Core: [‘ackermann', ‘ctype', ‘fibonacci’]
+- Crypto: [‘base64', ‘ed25519', ‘seqhash']
+- AI: ['tract_mobilenet_v2_onnx', 'tract_mobilenet_v2_tensorflow']
 - Regex: (Coming)
 
-## 2024 Q1 Goals
+## 2024 Q3 Goals
 Next steps include:
 - Improving stability and user experience
 - Adding benchmarks to the AI, Regex, and App suites
 - Adding more benchmarks (including w/native build support)
-- Complete the "simdscore" test
 - Publish a list of planned milestone with corresponding releases
 
 ## Usage

diff --git a/benchmarks/Dockerfile.rust b/benchmarks/Dockerfile.rust
@@ -1,4 +1,4 @@
-FROM rust:1.70
+FROM rust:1.75
 RUN rustup target add wasm32-wasi
 WORKDIR /usr/src
 ADD rust-benchmark rust-benchmark

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -1,3 +1,3 @@
 # Benchmarks
 
-The set of benchmarks here have been copied from [Sightglass](https://github.com/bytecodealliance/sightglass/benchmarks). In general, the benchmarks here and will mostly be consistent with the set of benchmarks in that repository.
+The set of benchmarks here have been copied from [Sightglass](https://github.com/bytecodealliance/sightglass/benchmarks). The benchmarks here and will mostly be consistent with the set of benchmarks in that repository.
diff --git a/benchmarks/all.suite b/benchmarks/all.suite
@@ -12,6 +12,7 @@ blind-sig/benchmark.wasm
 bz2/benchmark.wasm
 hex-simd/benchmark.wasm
 # image-classification/image-classification-benchmark.wasm
+inference_tract/benchmark.wasm
 intgemm-simd/benchmark.wasm
 libsodium/libsodium-aead_aes256gcm2.wasm
 libsodium/libsodium-aead_aes256gcm.wasm

diff --git a/benchmarks/build.sh b/benchmarks/build.sh
@@ -38,16 +38,18 @@ print_header "Build benchmarks"
 CONTAINER_ID=$(set -x; docker create $IMAGE_NAME)
 (set -x; docker cp $CONTAINER_ID:/benchmark/. $TMP_BENCHMARK)
 
-# Verify benchmark is a valid Sightglass benchmark.
-print_header "Verify benchmark"
+# Copy benchmark.
+print_header "Copy benchmark"
 # From https://stackoverflow.com/a/246128:
 SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]:-$0}"; )" &> /dev/null && pwd 2> /dev/null; )";
-SIGHTGLASS_CARGO_TOML=$(dirname $SCRIPT_DIR)/Cargo.toml
 for WASM in $TMP_BENCHMARK/*.wasm; do
-    (set -x; cargo run --manifest-path $SIGHTGLASS_CARGO_TOML --quiet -- validate $WASM)
     (set -x; mv $WASM $BENCHMARK_DIR/)
 done;
 
+for MODEL in $TMP_BENCHMARK/*.pb; do
+    (set -x; mv $MODEL $BENCHMARK_DIR/)
+done;
+
 # Clean up.
 print_header "Clean up"
 (set -x; rm $TMP_TAR)

diff --git a/benchmarks/image-classification/image-classification-benchmark.wasm b/benchmarks/image-classification/image-classification-benchmark.wasm
diff --git a/benchmarks/inference_tract/Dockerfile b/benchmarks/inference_tract/Dockerfile
@@ -0,0 +1,23 @@
+FROM rust:1.78 AS builder
+RUN rustup target add wasm32-wasi
+RUN mkdir /benchmark
+WORKDIR /usr/src
+
+# Compile mobile_net_v2_onnx
+ADD mobile_net_v2_onnx rust-benchmark
+WORKDIR /usr/src/rust-benchmark
+ENV CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse
+RUN (cd mobile_net_v2_onnx; cargo build --release --target wasm32-wasi)
+RUN cp target/wasm32-wasi/release/*benchmark.wasm /benchmark/mobile_net_v2_onnx_benchmark.wasm
+WORKDIR /usr/src
+RUN rm -rf rust-benchmark
+
+
+# Compile mobile_net_v2_tensorflow
+ADD mobile_net_v2_tensorflow rust-benchmark
+WORKDIR /usr/src/rust-benchmark
+RUN (cd mobile_net_v2_tensorflow; cargo build --release --target wasm32-wasi)
+RUN cp target/wasm32-wasi/release/*benchmark.wasm /benchmark/mobile_net_v2_tensorflow_benchmark.wasm
+RUN cp assets/mobilenet_v2_1.4_224_frozen.pb /benchmark/mobilenet_v2_1.4_224_frozen.pb
+WORKDIR /usr/src
+RUN rm -rf rust-benchmark
diff --git a/benchmarks/inference_tract/README.md b/benchmarks/inference_tract/README.md
@@ -0,0 +1,7 @@
+# Image Classification Wasmtime Benchmark
+
+A benchmark that runs an image classifier in pure Wasm. This can be used to
+benchmark the performance of float heavy computations.
+
+Note that the classifier model is not included in the repo because it is large
+and is instead downloaded if needed when running the `setup.sh` script.
diff --git a/benchmarks/inference_tract/build_mobile_net_v2_onnx_native.sh b/benchmarks/inference_tract/build_mobile_net_v2_onnx_native.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+
+# Build inference_tract benchmark as a native shared library (Linux-only).
+#
+# Usage: ./build_mobile_net_v2_onnx_native.sh
+
+(set -x;)
+(rm -rf mobile_net_v2_onnx_native);
+(cp -r mobile_net_v2_onnx mobile_net_v2_onnx_native);
+(cp mobile_net_v2_onnx_native.patch mobile_net_v2_onnx_native);
+(cd mobile_net_v2_onnx_native; patch -Np1 -i ./mobile_net_v2_onnx_native.patch; mv src/main.rs src/lib.rs; cd -);
+(cd mobile_net_v2_onnx_native; cargo build --release; cp target/release/libbenchmark.so ../mobile_net_v2_onnx_benchmark.so; cd -);
+(set +x;)
diff --git a/benchmarks/inference_tract/build_mobile_net_v2_tensorflow_native.sh b/benchmarks/inference_tract/build_mobile_net_v2_tensorflow_native.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+
+# Build inference_tract benchmark as native shared library (Linux-only).
+#
+# Usage: ./build_mobile_net_v2_tensorflow_native.sh
+
+(set -x;)
+(rm -rf mobile_net_v2_tensorflow_native);
+(cp -r mobile_net_v2_tensorflow mobile_net_v2_tensorflow_native);
+(cp mobile_net_v2_tensorflow_native.patch mobile_net_v2_tensorflow_native);
+(cd mobile_net_v2_tensorflow_native; patch -Np1 -i ./mobile_net_v2_tensorflow_native.patch; mv src/main.rs src/lib.rs; cd -);
+(cd mobile_net_v2_tensorflow_native; cargo build --release; cp target/release/libbenchmark.so ../mobile_net_v2_tensorflow_benchmark.so; cd -);
+(set +x;)
diff --git a/benchmarks/inference_tract/input.png b/benchmarks/inference_tract/input.png