Skip to content

Commit

Permalink
dequantize softmax (iree-org#9337)
Browse files Browse the repository at this point in the history
See iree-org#8974. This is still a 20% end-to-end latency improvement on MobileBert-int8 on configs where matmuls are already reasonably fast, making other things like Softmax more important relatively. That is even after Softmax slowness was much improved recently as observed in iree-org#9170. Moreover, discussion around iree-org#8974 suggests that the path forward for non-dequantized Softmax is nontrivial, so putting our benchmarks on the dequantized path for now will help insulate them a bit from what we expect to be in-flux for the foreseeable future.
  • Loading branch information
bjacob authored Jun 8, 2022
1 parent 9b86ad5 commit f9b3708
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 4 deletions.
6 changes: 5 additions & 1 deletion integrations/tensorflow/iree_tf_compiler/TFL/Passes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,13 @@ void buildTFLImportPassPipeline(OpPassManager &pm) {
// Convert all TFL ops to TOSA ops
//----------------------------------------------------------------------------

mlir::tosa::TOSATFTFLLegalizationPipelineOptions tosaOptions;
pm.addPass(createLowerGlobalTensorsPass());

mlir::tosa::TOSATFTFLLegalizationPipelineOptions tosaOptions;
// Temporary work-around for https://github.com/google/iree/issues/8974
tosaOptions.dequantize_tfl_softmax = true;
mlir::tosa::createTFTFLtoTOSALegalizationPipeline(pm, tosaOptions);

pm.nest<func::FuncOp>().addPass(mlir::tosa::createStripQuantTypesPass());
pm.addPass(createCanonicalizerPass());
pm.addPass(createReconcileUnrealizedCastsPass());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,16 @@ def generate_inputs(self, input_details):
def compare_results(self, iree_results, tflite_results, details):
super(MobileBertTest, self).compare_results(iree_results, tflite_results,
details)
# We have confirmed in large scale accuracy tests that differences this large is acceptable.
# We have confirmed in large scale accuracy tests that differences as large
# as 5.0 is acceptable. We later further relaxed from 5.0 to 7.0 in
# https://github.com/google/iree/pull/9337 when quantized Softmax got
# de-quantized, which should be numerically correct albeit not bit-exact.
# The actual observed max error was ~ 6.36. The value 7.0 is that rounded up
# to the next integer.
self.assertTrue(
np.isclose(iree_results[0], tflite_results[0], atol=5.0).all())
np.isclose(iree_results[0], tflite_results[0], atol=7.0).all())
self.assertTrue(
np.isclose(iree_results[1], tflite_results[1], atol=5.0).all())
np.isclose(iree_results[1], tflite_results[1], atol=7.0).all())

def test_compile_tflite(self):
self.compile_and_execute()
Expand Down

0 comments on commit f9b3708

Please sign in to comment.