From bd3af09ae8213546437cee0bc6c4b666d0a1f613 Mon Sep 17 00:00:00 2001
From: Ben Marshall <ben.marshall@bristol.ac.uk>
Date: Tue, 4 Aug 2020 20:19:54 +0100
Subject: [PATCH] fusion: wip - remove some bad amths

---
 doc/supp/fusion.adoc | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/doc/supp/fusion.adoc b/doc/supp/fusion.adoc
index 739a7333..2ef63642 100644
--- a/doc/supp/fusion.adoc
+++ b/doc/supp/fusion.adoc
@@ -155,8 +155,6 @@ round state.
 When implemented on RV32I with the scalar cryptography extensions,
 the `QR` function requires 12 instructions.
 One round is hence `48` instructions.
-ChaCha20 uses 10 iterations of a double round, which gives
-`10*48*2=960` cycles, assuming `1` instruction per cycle.
 
 There is one fusable sequence which meets our criteria:
 
@@ -190,7 +188,6 @@ C code, then two more instructions are needed to put them back.
 
 A core capable of fusing this sequence when both instructions are upto
 32 bits long saves  `4` cycles per quarter round.
-An entire block operation is then `640` cycles - `33%` faster.
 
 A core capable of fusing this sequence only when the xor is 16 bits and
 the rori is 32 bits is more complex to analyse.
@@ -202,8 +199,7 @@ are the ones best placed
 in the compressed instruction registers `s0,s1,a0,...,a5`, since this
 creates the largest number of fusable 48-bit instruction sequences.
 In either case, two occurrences of sequence 1 are fusable.
-Hence a quarter round is then `10` cycles, and `10` double rounds
-becomes `800` cycles.
+Hence a quarter round is then `10` cycles and a round `40` cycles.
 
 A core which can only fuse two 16-bit instructions is incapable
 of fusing sequence 1.