diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
index 5463d1beb..73e0551c0 100644
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@@ -39,6 +39,7 @@
     - [SHA-256](design/gadgets/sha256.md)
       - [16-bit table chip](design/gadgets/sha256/table16.md)
     - [Double-and-add](design/gadgets/double-and-add.md)
+    - [Endoscaling](design/gadgets/endoscaling.md)
 - [Background Material](background.md)
   - [Fields](background/fields.md)
   - [Polynomials](background/polynomials.md)
diff --git a/book/src/design/gadgets/endoscaling.md b/book/src/design/gadgets/endoscaling.md
new file mode 100644
index 000000000..2f0ea54b9
--- /dev/null
+++ b/book/src/design/gadgets/endoscaling.md
@@ -0,0 +1,424 @@
+# Endoscaling
+
+Often in proof systems, it is necessary to multiply a group element by a scalar that depends
+on a challenge. Since the challenge is random, what matters is only that the scalar retains
+that randomness; that is, it is acceptable to apply a 1-1 mapping to the scalar if that allows
+the multiplication to be done more efficiently.
+
+The Pasta curves (as well as Pluto-Eris) we use for Halo 2 are equipped with an endomorphism that allows such
+efficient multiplication. By allowing a 1-1 mapping as described above, we can avoid having
+to "decompose" the input challenge using an algorithm such as
+[[Pornin2020]](https://eprint.iacr.org/2020/454) that requires lattice basis reduction.
+
+## Definitions
+
+- The Lagrange basis polynomial $\ell_i(X)$ is such that $\ell_i(\omega^i) = 1$ and
+  $\ell_i(\omega^j) = 0$ for $i \neq j$. It is defined as
+  $$
+  \ell_i(X) = \prod_{j \neq i} \frac{(X - \omega^j)}{(\omega^i - \omega^j)}.
+  $$
+
+- We consider curves over a base field $\mathbb{F}_p$ with a "cubic endomorphism" $\phi$
+  defined on $\mathbb{F}_p$-rational points by $\phi((x, y)) = (\zeta_p \cdot x, y)$ for
+  $\zeta_p \in \mathbb{F}_p$. This is equivalent to $\phi(P) = [\zeta_q]P$ for some
+  $\zeta_q \in \mathbb{F}_q$ of multiplicative order $3$.
+
+
+### Proof that defining $\phi((x, y)) \triangleq (\zeta_p \cdot x, y)$ implies $\phi(P) = [\zeta_q]P$ for some $\zeta_q$
+
+Let $E$ be our elliptic curve, and assume that $\phi$ is a cubic group automorphism of $E$. Fix a generator $B \in E$ and observe that $\phi([s]B) = [s]\phi(B)$ for all $s \in \mathbb{Z}$, since $\phi(P + Q) = \phi(P) + \phi(Q)$. Now there must be some $z \in \mathbb{Z}$ s.t. $\phi(B) = [z]B$, and so
+$$\phi([s]B) = [s]\phi(B) = [s]([z]B) = [z]([s]B),$$
+i.e. $\phi(P) = [z]P$ for all $P$. But then $B = \phi^3(B) = [z^3]B$ implies $z^3 = 1 \bmod q$. And so $\zeta_q \triangleq z$ is what we're after.
+
+So, it remains to show that $\phi((x, y)) \triangleq (\zeta_p \cdot x, y)$ is a cubic group automorphism of $E$. It's clear that $\phi$ defined this way is a cubic <em>set</em> automorphism of $E$, since $\zeta_p^3 = 1$ and our curve equation is of the form $E = \{(x,y): y^2 = x^3 + b\}$. Hence it remains to show that $\phi(P + Q) = \phi(P) + \phi(Q)$. To see this, just check the equations for [adding distinct points](/design/gadgets/ecc/addition.html#incomplete-addition) and [adding a point to itself](/design/gadgets/ecc/addition.html#complete-addition) and note that
+$$x_r((\zeta_p \cdot x_p, y_p), (\zeta_p \cdot x_q, y_q)) = \zeta_p \cdot x_r((x_p, y_p), (x_q, y_q))$$
+and
+$$y_r((\zeta_p \cdot x_p, y_p), (\zeta_p \cdot x_q, y_q)) = y_r((x_p, y_p), (x_q, y_q))$$
+in both cases, where here $x_r((x_p, y_p), (x_q, y_q))$ and $y_r((x_p, y_p), (x_q, y_q))$ refer to the expressions for the output point coordinates $R = (x_r, y_r)$ in the addition formulas, as functions of the inputs point coordinates $P = (x_p, y_p)$ and $Q = (x_q, y_q)$. $\square$
+</details>
+
+## Endoscaling for public inputs
+
+In the Halo 2 proof system, this technique can optionally be used to commit to an instance
+column using bits that represent the public input. Each basis polynomial corresponds with a
+cell in the column.
+
+## Computing an endoscaling commitment
+
+Let $N$ be the limit on the number of bits that can be input to endoscaling at once while
+avoiding collisions. For CM curves that have a cubic endomorphism $\phi$, such as the
+Pasta and Pluto-Eris curves, this limit can be computed using the script
+[checksumsets.py in zcash/pasta](https://github.com/zcash/pasta/blob/master/checksumsets.py).
+
+Assume that $N$ is even. (For Pasta, $N = 248$; for Pluto-Eris, $N = 442$.)
+
+Let $\text{Endoscale}$ be Algorithm 1 in the [Halo paper](https://eprint.iacr.org/2019/1021.pdf):
+
+$$
+(\mathbf{r}, G) \mapsto [n(\mathbf{r})] G
+$$
+
+Given $G_i = \text{Comm}(\ell_i(X))$, we compute an endoscaling instance column commitment by
+calculating the sum $P = \sum_{i = 0}^{m - 1} \text{Endoscale}(\mathbf{r}_i, G_i)$.
+
+### Algorithm 1 (optimized)
+
+The input bits to endoscaling are $\mathbf{r}$. Split $\mathbf{r}$ into $m$ chunks
+$\mathbf{r}_0, \mathbf{r}_1, ..., \mathbf{r}_{m - 1} \in \{0, 1\}^N$. For now assume that all
+the $\mathbf{r}_i$ are the same length.
+
+let $S(i, j) = \begin{cases}
+  [2\mathbf{r}_{i,2j} - 1] G_i,\text{ if } \mathbf{r}_{i,2j+1} = 0, \\
+  \phi([2\mathbf{r}_{i,2j} - 1] G_i),\text{ otherwise}.
+\end{cases}$
+
+$P := [2] \sum_{i=0}^{m-1} (G_i + \phi(G_i))$
+
+for $j$ in $0..N/2$:
+
+$$
+\begin{array}{l}
+\mathrm{Inner} := S(0, j) \\
+\text{for $i$ in $1..m$:} \\
+\hspace{2em} \mathrm{Inner} := \mathrm{Inner} \;⸭\; S(i, j) \\
+P := (P \;⸭\; \mathrm{Inner}) \;⸭\; P \\
+\end{array}
+$$
+which is equivalent to (using complete addition)
+
+$$
+\begin{array}{l}
+P := \mathcal{O} \\
+\text{for $i$ in $0..m$:} \\
+\hspace{2em} P := [2] (G_i + \phi(G_i)) \\
+\hspace{2em} \text{for $j$ in $0..N/2$:} \\
+\hspace{4em} P := (P + S(i, j)) + P \\
+\end{array}
+$$
+
+#### Circuit cost
+We decompose each $\mathbf{r}_i$ chunk into two-bit chunks:
+
+$$
+\mathbf{r} = c_0 + 4 \cdot c_1 + ... + 4^{N/2 - 1} \cdot c_{N/2 -1}
+$$
+
+with a running sum $z_j, j \in [0..(N/2)).$ $z_0$ is initialized as
+$z_0 = \mathbf{r}$. Each subsequent $z_j$ is calculated as:
+$z_j = (z_{j-1} - c_{j-1}) \cdot 2^{-2}$. The final $z_{N/2} = 0$.
+
+Each $c_j$ is further broken down as $c_j = b_{j,0} + 2 \cdot b_{j,1}$.
+The tuple $(b_0, b_1)$ maps to the endoscaled points:
+
+$$
+\begin{array}{rl}
+  (0, 0) &\rightarrow (G_x, -G_y) \\
+  (0, 1) &\rightarrow (\zeta \cdot G_x, -G_y) \\
+  (1, 0) &\rightarrow (G_x, G_y) \\
+  (1, 1) &\rightarrow (\zeta \cdot G_x, G_y)
+\end{array}
+$$
+
+which are accumulated using the [double-and-add](./double-and-add.md) algorithm.
+
+Let $r$ be the number of incomplete additions we're doing per row. For $r = 1$:
+
+$$
+\begin{array}{|c|c|c|c|c|c|c|}
+\hline
+      z     &       b_0      &       b_1      &  x_G  &  y_G  &       x_a            &        x_p          \\\hline
+            &                &                &       &       &       x_G            &        y_G          \\\hline
+            &                &                &       &       & x_{G + \phi(G)}      & y_{G + \phi(G)}     \\\hline
+   z_{N/2}  & b_{0,N/2 - 1}  &  b_{1,N/2 - 1} &  x_G  &  y_G  &   x_\texttt{InitAcc} &  x_{P, N/2 - 1}     \\\hline
+z_{N/2 - 1} & b_{0,N/2 - 2}  &  b_{1,N/2 - 2} &  x_G  &  y_G  &  x_{A, N/2 - 2}      &  x_{P, N/2 - 2}     \\\hline
+     \dots  &       \dots    &       \dots    &  \dots&  \dots&       \dots          &       \dots         \\\hline
+     z_1    &    b_{0, 0}    &    b_{1, 0}    &  x_G  &  y_G  &     x_{A,0}          &     x_{P,0}         \\\hline
+     z_0    &                &                &       &       &   x_{A,final}        &                     \\\hline
+\end{array}
+$$
+
+$$
+\begin{array}{|c|c|c|c|c|c|}
+\hline
+       z         & \dots &             \lambda_1             &       \lambda_2           &  q_\texttt{endoscale\_base} & q_\texttt{init}    \\\hline
+                 & \dots & x_{\phi(g)} = \zeta \cdot x_g     &           y_g             &       0                     &     0              \\\hline
+                 & \dots &           x_\texttt{InitAcc}      &       y_\texttt{InitAcc}  &       0                     &     1              \\\hline
+    z_{N/2}      & \dots &      \lambda_{1, n/2 - 1}         & \lambda_{2, n/2 - 1}      &       1                     &     0              \\\hline
+    z_{N/2 - 1}  & \dots &      \lambda_{1, n/2 - 2}         & \lambda_{2, n/2 - 2}      &       0                     &     0              \\\hline
+      \dots      & \dots &               \dots               &          \dots            &      \dots                  &    \dots           \\\hline
+      z_1        & \dots &          \lambda_{1,0}            &     \lambda_{2,0}         &       0                     &     0              \\\hline
+      z_0        & \dots &           y_{a,final}             &                           &                             &                    \\\hline
+\end{array}
+$$
+
+$$
+\begin{array}{|c|c|c|c|c|c|}
+\hline
+       z         & \dots &    q_\texttt{double} &  q_\texttt{add\_incomplete}  &         q_\dba             & q_\texttt{final} \\\hline
+                 & \dots &     0                &           1                  &         0                  &     0            \\\hline
+                 & \dots &     1                &           0                  &         0                  &     0            \\\hline
+    z_{N/2}      & \dots &     0                &           0                  &         1                  &     0            \\\hline
+    z_{N/2 - 1}  & \dots &     0                &           0                  &         1                  &     0            \\\hline
+      \dots      & \dots &    \dots             &          \dots               &         \dots              &    \dots         \\\hline
+      z_1        & \dots &     0                &           0                  &         0                  &     1            \\\hline
+      z_0        & \dots &                      &                              &                            &                  \\\hline
+\end{array}
+$$
+
+For each row $j$ from $N/2$ down to $0$, we check the decomposition and the
+endoscaling map.
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+       3      & q_\texttt{endoscale\_base} \cdot \BoolCheck{b_0} = 0 \\\hline
+       3      & q_\texttt{endoscale\_base} \cdot \BoolCheck{b_1} = 0 \\\hline
+       2      & q_\texttt{endoscale\_base} \cdot [(z_{j - 1} - z_{j} \cdot 2^{-2}) - (b_0 + 2\cdot b_1)] = 0 \\\hline
+       3      & q_\texttt{endoscale\_base} \cdot b_0 \cdot (y_p - y_G) + (1 - b_0) \cdot (y_p + y_G) = 0 \\\hline
+       3      & q_\texttt{endoscale\_base} \cdot b_1 \cdot (x_p - \zeta \cdot x_G) + (1 - b_1) \cdot (x_p - x_G) = 0 \\\hline
+\end{array}
+$$
+
+where
+
+$$
+y_p = y_a - \lambda_1 \cdot (x_a - x_p)
+$$
+
+The $q_{\dba}$ selector is also passed to the [double-and-add](./double-and-add.md)
+helper as $\texttt{q\_gradient}$, which means that the double-and-add gradient
+check is activated on each row where $q_\dba = 1$:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+      3       & q_{\dba} \cdot \left(\lambda_{2,i} \cdot (x_{A,i} - x_{A,i-1}) - y_{A,i} - y_{A,i-1}\right) = 0 \\\hline
+\end{array}
+$$
+
+This composite selector $q_\dba + q_\texttt{final}$ is passed to the
+[double-and-add](./double-and-add.md) helper as $q_\texttt{secant}$
+which means that the double-and-add secant check is activated on each row
+where $q_\dba + q_\texttt{final} = 1$:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+3 & (q_\dba + q_\texttt{final} \cdot \left(\lambda_{2,i}^2 - x_{A,i-1} - x_{R,i} - x_{A,i}\right) = 0 \\\hline
+\end{array}
+$$
+
+where
+
+$$
+\begin{aligned}
+x_{R,i} &= \lambda_{1,i}^2 - x_{A,i} - x_T, \\
+y_{A,i} &= \frac{(\lambda_{1,i} + \lambda_{2,i}) \cdot (x_{A,i} - (\lambda_{1,i}^2 - x_{A,i} - x_T))}{2},\\
+y_{A,i-1}^\text{witnessed} &\text{ is witnessed.}
+\end{aligned}
+$$
+
+##### Initialization
+To initialize the double-and-add, we set the accumulator to $InitAcc = [2](\phi(P) + P)$.
+In the case where $P$ is a fixed base, we copy it in from a fixed column; if
+$P$ is a variable base, we require it to be provided as a `NonIdentityEccPoint`
+from the ECC gadget, and copy in the $x$ and $y$ coordinates.
+
+(The initial section of the layout has been reproduced here for ease of reference.)
+
+$$
+\begin{array}{|c|c|c|c|c|c|c|}
+\hline
+       x_a      &        x_p      &           \lambda_1           &        \lambda_2        &  q_\texttt{init}  &  q_\texttt{add\_incomplete} &  q_\texttt{double} \\\hline
+       x_P      &        y_P      &          x_{\phi(P)}          &    y_{\phi(P)} = y_P    &     0             &          1                  &      0             \\\hline
+x_{P + \phi(P)} & y_{P + \phi(P)} &          x_\texttt{InitAcc}          &     y_\texttt{InitAcc}         &     1             &          0                  &      1             \\\hline
+   x_{N/2 - 1}  &  x_{N/2 - 1,p}  &      \lambda_{N/2 - 1, 1}     & \lambda_{N/2 - 1, 2}    &     0             &          0                  &      0             \\\hline
+\end{array}
+$$
+
+We use the [incomplete addition](./ecc/addition.md#incomplete-addition) helper
+to perform the first incomplete addition of $\phi(P) + P$. The result is then passed into the doubling helper to get the initial accumulator $[2](\phi(P) + P)$.
+
+The $q_\texttt{init}$ selector checks that:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+       2      & q_\texttt{init} \cdot \left(\zeta \cdot x_P - x_{\phi(P)}\right) = 0 \\\hline
+       3      & q_\texttt{init} \cdot (y_\texttt{InitAcc} - y_{A,N/2 - 1}) = 0 \\\hline
+\end{array}
+$$
+
+where
+
+$$
+y_{A,N/2 - 1} = \frac{(\lambda_{1,N/2 - 1} + \lambda_{2,N/2 - 1}) \cdot (x_{A,N/2 - 1} - (\lambda_{1,N/2 - 1}^2 - x_\texttt{InitAcc} - x_T))}{2}
+$$
+
+#### Finalization
+In the final section of the double-and-add algorithm, we witness the $y$ coordinate
+of the accumulator and check that it is consistent with the previous values.
+
+(The final section of the layout has been reproduced here for ease of reference.)
+
+$$
+\begin{array}{|c|c|c|c|c|c|c|c|c|c|}
+\hline
+      z     &       b_0      &       b_1      &  x_G  &  y_G  &       x_a       &        x_p      &            \lambda_1          &       \lambda_2      & q_\texttt{final} \\\hline
+     z_1    &    b_{0, 0}    &    b_{1, 0}    &  x_G  &  y_G  &     x_{A,0}     &     x_{P,0}     &          \lambda_{1,0}        &     \lambda_{2,0}    &     1            \\\hline
+     z_0    &                &                &       &       &   x_{A,final}   &                 &           y_{A,final}         &                      &                  \\\hline
+\end{array}
+$$
+
+The $q_\texttt{final}$ selector checks that:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+       2      & q_\texttt{final} \cdot \left(\lambda_{2,0} \cdot (x_{A,0} - x_{A, final}) - y_{A,0} - y_{A, final}\right) = 0 \\\hline
+\end{array}
+$$
+
+### Algorithm 2
+
+Split $\mathbf{r}$ into $K$-bit chunks $r_{0..=u-1}$.
+
+$\mathsf{Acc} := 2(\zeta + 1)$
+
+for $i$ from $N/K - 1$ down to $0$:
+
+$\hspace{2em}$ look up $s = \mathsf{endoscale\_scalar}(r_i)$
+
+$\hspace{2em}$ $\mathsf{Acc} := 2^{K/2} \cdot \mathsf{Acc} + s$
+
+#### Handling partial chunks
+
+Suppose that $\mathbf{r}$ is not a multiple of $K$ bits. In that case we will have a partial chunk $r_u$ of length $K' < K$ bits.
+The unoptimized algorithm for computing the table is:
+
+$(a, b) := (0, 0)$
+
+for $i$ from $K/2 − 1$ down to $0$:
+
+$\hspace{2em}$ let $(\mathbf{c}_i, \mathbf{d}_i) = \begin{cases}
+(0, 2\mathbf{r}_{2i} − 1),&\text{if } \mathbf{r}_{2i+1} = 0 \\
+(2\mathbf{r}_{2i} − 1, 0),&\text{otherwise}
+\end{cases}$
+
+$(a, b) := (2a + \mathbf{c}_i, 2b + \mathbf{d}_i)$
+
+Output $[a \cdot \zeta_q + b]\, P$.
+
+We want to derive the table output for $K'$ when $\mathbf{r} = r_u$ from the table output for $K$.
+Pad $r_u$ to $K$ bits on the right (high-order bits) with zeros.
+
+So the effect of running the above algorithm for the padding bits will be:
+
+$(a, b) := (0, 0)$
+
+for $i$ from $0$ up to $(K-K')/2 − 1$:
+
+$\hspace{2em} b := 2b - 1$
+
+(which is equivalent to $(a, b) := (0, 1 - 2^{(K-K')/2})$)
+
+for $i$ from $(K-K')/2$ up to $K/2 − 1$:
+
+$\hspace{2em}$ let $(\mathbf{c}_i, \mathbf{d}_i) = \begin{cases}
+(0, 2\mathbf{r}_{2i} − 1),&\text{if } \mathbf{r}_{2i+1} = 0 \\
+(2\mathbf{r}_{2i} − 1, 0),&\text{otherwise}
+\end{cases}$
+
+$\hspace{2em} (a, b) := (2a + \mathbf{c}_i, 2b + \mathbf{d}_i)$
+
+Output $[a \cdot \zeta_q + b]\, P$.
+
+So now we need to adjust the result of the table lookup to take account that we initialized $(a, b)$ to $(0, 1 - 2^{(K-K')/2})$ instead of $(0, 0)$.
+
+The offset for $b$ will get multiplied by $2^{K'/2}$, which means that we need to subtract $(1 - 2^{(K-K')/2}) \cdot 2^{K'/2} = (2^{K'/2} - 2^{K/2})$.
+
+#### Circuit costs
+
+##### Initial chunk
+In the case where the bitstring length is a multiple of $K$, we witness the first
+full chunk like so:
+
+$$
+\begin{array}{|c|c|c|c|c|}
+  \hline
+  \texttt{z}   & \texttt{acc} & \texttt{endoscalars\_copy} & q_\texttt{init} & q_\texttt{lookup}   \\\hline
+     z[u]      &     acc_1    &      \texttt{endo}(r_u)    &         1        &           1        \\\hline
+     z[u-1]    &              &                            &         0        &           0        \\\hline
+\end{array}
+$$
+
+with the following constraints:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+2 & q_\text{init} \cdot [(\texttt{init\_acc} \cdot 2^{K / 2} + \texttt{endo}(r_u))  - acc_1] = 0 \\\hline
+\end{array}
+$$
+
+where $\texttt{init\_acc} = 2 \cdot (\zeta + 1)$.
+As before, $q_\texttt{lookup}$ looks up the tuple $(z[u-1] - z[u] * 2^K, \texttt{endo}(r_u)).$
+
+If the first chunk is a $K'$-bit partial chunk, it has been right-padded with $K - K'$ zeros.
+We constrain it in its own region:
+
+$$
+\begin{array}{|c|c|c|c|c|c|}
+  \hline
+  \texttt{z}   & \texttt{acc} & \texttt{endoscalars\_copy} & \texttt{q\_partial} & q_\texttt{lookup} & q_\texttt{short\_range\_check} \\\hline
+     z[u]      &      r_u     &      \texttt{endo}(r_u)    &          1          &           1        &                 1             \\\hline
+     z[u-1]    &    acc_1     &          2^{K'/2}          &          0          &           0        &                 0             \\\hline
+\end{array}
+$$
+
+with the following constraints:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+2 & q_\text{partial} \cdot [(z[u-1] - z[u] \cdot 2^K) - r_u] = 0 \\\hline
+2 & q_\text{partial} \cdot [(\texttt{init\_acc} \cdot 2^{K' / 2} + \texttt{shifted\_endo})  - acc_1] = 0 \\\hline
+\end{array}
+$$
+
+where $\texttt{init\_acc} = 2 \cdot (\zeta + 1),$ and $\texttt{shifted\_endo} = \texttt{endo}(r_u) - (2^{K'/2} - 2^{K/2})$.
+
+As before, $q_\texttt{lookup}$ looks up the tuple $(z[u-1] - z[u] * 2^K, \texttt{endo}(r_u)).$
+Additionally, we do a $q_\texttt{short\_range\_check}(r_u, K')$ to check that $r_u$ is
+indeed a $K'$-bit value. (see [Lookup short range check](./decomposition.md#short-range-check).)
+
+##### Steady state
+After initializing the first chunk, we proceed with the remaining chunks in the steady state:
+
+$$
+\begin{array}{|c|c|c|c|c|}
+  \texttt{z}   & \texttt{acc} & \texttt{endoscalars\_copy} & \texttt{q\_endoscale} & q_\texttt{lookup} \\\hline
+     z[i]      &  acc_{u-i+1} &      \texttt{endo}(r_i)    &           1           &           1       \\\hline
+     z[i-1]    &  acc_{u-i}   &   \texttt{endo}(r_{i-1})   &           1           &           1       \\\hline
+     z[i-2]    &              &                            &           0           &           0       \\\hline
+\end{array}
+$$
+
+with the following constraints:
+
+$$
+\begin{array}{|c|l|}
+\hline
+\text{Degree} & \text{Constraint} \\\hline
+2 & q_\text{endoscale} \cdot [(acc_{u-i+1} \cdot 2^{K / 2} + \texttt{endo}(r_i))  - acc_{u-i}] = 0 \\\hline
+\end{array}
+$$
+
+As before, $q_\texttt{lookup}$ looks up the tuple $(z[i-1] - z[i] \cdot 2^K, \texttt{endo}(r_i)).$
diff --git a/halo2_gadgets/Cargo.toml b/halo2_gadgets/Cargo.toml
index 7b24c62ee..c8e93102f 100644
--- a/halo2_gadgets/Cargo.toml
+++ b/halo2_gadgets/Cargo.toml
@@ -78,3 +78,7 @@ required-features = ["unstable"]
 [[bench]]
 name = "decompose_running_sum"
 harness = false
+
+[[bench]]
+name = "endoscale"
+harness = false
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-64-8.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-64-8.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-64-8.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-66-9.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-66-9.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-66-9.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-64-8.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-64-8.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-64-8.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-66-9.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-66-9.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-base-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-66-9.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-64-8.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-64-8.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-64-8.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-66-9.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-66-9.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EpAffine-pasta_curves::curves::EqAffine-8-66-9.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-64-8.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-64-8.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-64-8.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-66-9.csv b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-66-9.csv
new file mode 100644
index 000000000..890a39b33
--- /dev/null
+++ b/halo2_gadgets/benches/eccops_goldenfiles/endoscale-scalar-pasta_curves::curves::EqAffine-pasta_curves::curves::EpAffine-8-66-9.csv
@@ -0,0 +1,2 @@
+max_deg,advice_columns,lookups,permutations,column_queries,point_sets,proof_size
+12,9,2,11,52,4,4448
diff --git a/halo2_gadgets/benches/endoscale.rs b/halo2_gadgets/benches/endoscale.rs
new file mode 100644
index 000000000..c0ede1bd3
--- /dev/null
+++ b/halo2_gadgets/benches/endoscale.rs
@@ -0,0 +1,317 @@
+mod utilities;
+
+use halo2_gadgets::ecc::chip::NonIdentityEccPoint;
+use halo2_gadgets::endoscale::{
+    chip::{CurveEndoscale, EndoscaleConfig},
+    util::compute_endoscalar_with_acc,
+    EndoscaleInstructions,
+};
+
+use ff::{Field, FromUniformBytes, PrimeFieldBits};
+use halo2_proofs::{
+    circuit::{Layouter, SimpleFloorPlanner, Value},
+    plonk::{
+        Advice, Circuit, Column, ConstraintSystem, Error,
+    },
+    poly::{
+        commitment::ParamsProver,
+        ipa::commitment::ParamsIPA,
+    },
+};
+use halo2curves::pasta::{pallas, vesta};
+use halo2curves::CurveAffine;
+
+use std::{any::type_name, convert::TryInto, marker::PhantomData};
+
+use criterion::{criterion_group, criterion_main, Criterion};
+
+#[derive(Clone)]
+struct BaseCircuit<
+    C: CurveAffine,
+    const K: usize,
+    const NUM_BITS: usize,
+    const NUM_BITS_DIV_K_CEIL: usize,
+> where
+    C::Base: PrimeFieldBits,
+{
+    bitstring: Value<[bool; NUM_BITS]>,
+    pub_input_rows: [usize; NUM_BITS_DIV_K_CEIL],
+    _marker: PhantomData<C>,
+}
+
+impl<
+        C: CurveAffine + CurveEndoscale,
+        const K: usize,
+        const NUM_BITS: usize,
+        const NUM_BITS_DIV_K_CEIL: usize,
+    > Circuit<C::Base> for BaseCircuit<C, K, NUM_BITS, NUM_BITS_DIV_K_CEIL>
+where
+    C::Base: PrimeFieldBits,
+{
+    type Config = (EndoscaleConfig<C, K, 248>, Column<Advice>);
+    type FloorPlanner = SimpleFloorPlanner;
+
+    fn without_witnesses(&self) -> Self {
+        Self {
+            bitstring: Value::unknown(),
+            pub_input_rows: self.pub_input_rows,
+            _marker: PhantomData,
+        }
+    }
+
+    fn configure(meta: &mut ConstraintSystem<C::Base>) -> Self::Config {
+        let constants = meta.fixed_column();
+        meta.enable_constant(constants);
+
+        let advices = (0..8)
+            .map(|_| meta.advice_column())
+            .collect::<Vec<_>>()
+            .try_into()
+            .unwrap();
+        let running_sum = meta.advice_column();
+        let endoscalars = meta.instance_column();
+
+        (
+            EndoscaleConfig::configure(meta, advices, running_sum, endoscalars),
+            running_sum,
+        )
+    }
+
+    fn synthesize(
+        &self,
+        config: Self::Config,
+        mut layouter: impl Layouter<C::Base>,
+    ) -> Result<(), Error> {
+        config.0.alg_2.table.load(&mut layouter)?;
+
+        let bitstring =
+            config
+                .0
+                .witness_bitstring(&mut layouter, &self.bitstring.transpose_array(), true)?;
+
+        // Alg 1 (fixed base)
+        let g_lagrange = ParamsIPA::<C>::new(11).g_lagrange()[0];
+        config
+            .0
+            .endoscale_fixed_base(&mut layouter, bitstring.clone(), vec![g_lagrange])?;
+
+        // Alg 1 (variable base)
+        let g_lagrange = layouter.assign_region(
+            || "g_lagrange",
+            |mut region| {
+                let x = region.assign_advice(
+                    || "x",
+                    config.1,
+                    0,
+                    || Value::known(*g_lagrange.coordinates().unwrap().x()),
+                )?;
+                let y = region.assign_advice(
+                    || "y",
+                    config.1,
+                    1,
+                    || Value::known(*g_lagrange.coordinates().unwrap().y()),
+                )?;
+
+                Ok(NonIdentityEccPoint::<C>::from_coordinates_unchecked(
+                    x.into(),
+                    y.into(),
+                ))
+            },
+        )?;
+        config
+            .0
+            .endoscale_var_base(&mut layouter, bitstring, vec![g_lagrange])?;
+
+        Ok(())
+    }
+}
+
+#[derive(Clone)]
+struct ScalarCircuit<
+    C: CurveAffine,
+    const K: usize,
+    const NUM_BITS: usize,
+    const NUM_BITS_DIV_K_CEIL: usize,
+> where
+    C::Base: PrimeFieldBits,
+{
+    bitstring: Value<[bool; NUM_BITS]>,
+    pub_input_rows: [usize; NUM_BITS_DIV_K_CEIL],
+    _marker: PhantomData<C>,
+}
+
+impl<
+        C: CurveAffine + CurveEndoscale,
+        const K: usize,
+        const NUM_BITS: usize,
+        const NUM_BITS_DIV_K_CEIL: usize,
+    > Circuit<C::Base> for ScalarCircuit<C, K, NUM_BITS, NUM_BITS_DIV_K_CEIL>
+where
+    C::Base: PrimeFieldBits,
+{
+    type Config = EndoscaleConfig<C, K, 248>;
+    type FloorPlanner = SimpleFloorPlanner;
+
+    fn without_witnesses(&self) -> Self {
+        Self {
+            bitstring: Value::unknown(),
+            pub_input_rows: self.pub_input_rows,
+            _marker: PhantomData,
+        }
+    }
+
+    fn configure(meta: &mut ConstraintSystem<C::Base>) -> Self::Config {
+        let constants = meta.fixed_column();
+        meta.enable_constant(constants);
+
+        let advices = (0..8)
+            .map(|_| meta.advice_column())
+            .collect::<Vec<_>>()
+            .try_into()
+            .unwrap();
+        let running_sum = meta.advice_column();
+        let endoscalars = meta.instance_column();
+
+        EndoscaleConfig::configure(meta, advices, running_sum, endoscalars)
+    }
+
+    fn synthesize(
+        &self,
+        config: Self::Config,
+        mut layouter: impl Layouter<C::Base>,
+    ) -> Result<(), Error> {
+        config.alg_2.table.load(&mut layouter)?;
+
+        let bitstring =
+            config.witness_bitstring(&mut layouter, &self.bitstring.transpose_array(), false)?;
+
+        // Alg 2 with lookup
+        config.compute_endoscalar(&mut layouter, &bitstring[0])?;
+
+        // Constrain bitstring
+        config.constrain_bitstring(&mut layouter, &bitstring[0], self.pub_input_rows.to_vec())?;
+
+        Ok(())
+    }
+}
+
+fn bench_endoscale_base<
+    C1,
+    C2,
+    const K: usize,
+    const NUM_BITS: usize,
+    const NUM_BITS_DIV_K_CEIL: usize,
+>(
+    c: &mut Criterion,
+) where
+    C1::Scalar: FromUniformBytes<64>,
+    C1::Base: PrimeFieldBits + FromUniformBytes<64>,
+    C2::Scalar: FromUniformBytes<64>,
+    C2::Base: PrimeFieldBits + FromUniformBytes<64>,
+    C1: CurveAffine<Base = C2::Scalar> + CurveEndoscale,
+    C2: CurveAffine + CurveEndoscale,
+{
+    let bitstring: [bool; NUM_BITS] = (0..NUM_BITS)
+        .map(|_| rand::random::<bool>())
+        .collect::<Vec<_>>()
+        .try_into()
+        .unwrap();
+
+    // Public input of endoscalars in the base field corresponding to the bits.
+    let pub_inputs_base: Vec<C2::Scalar> = {
+        // Pad bitstring to multiple of K.
+        let bitstring = bitstring
+            .iter()
+            .copied()
+            .chain(std::iter::repeat(false))
+            .take(K * NUM_BITS_DIV_K_CEIL)
+            .collect::<Vec<_>>();
+        bitstring
+            .chunks(K)
+            .map(|chunk| compute_endoscalar_with_acc(Some(C1::Base::ZERO), chunk))
+            .collect()
+    };
+
+    let base_circuit = BaseCircuit::<C1, K, NUM_BITS, NUM_BITS_DIV_K_CEIL> {
+        bitstring: Value::known(bitstring),
+        pub_input_rows: (0..NUM_BITS_DIV_K_CEIL)
+            .collect::<Vec<_>>()
+            .try_into()
+            .unwrap(),
+        _marker: PhantomData,
+    };
+
+    // There needs to be an extra row, compared to testing, since it needs extra room for binding factors and stuff for ZK.
+    let name = format!("endoscale-base-{}-{}-{}-{}-{}", type_name::<C1>(), type_name::<C2>(), K, NUM_BITS, NUM_BITS_DIV_K_CEIL);
+    let k = (K + 1) as u32;
+    utilities::circuit_to_csv::<C2>(k, &name, &[&pub_inputs_base[..]], base_circuit.clone());
+    utilities::bench_circuit::<C2>(c, k, &name, &[&pub_inputs_base[..]], base_circuit);
+}
+
+fn bench_endoscale_scalar<
+    C1,
+    C2,
+    const K: usize,
+    const NUM_BITS: usize,
+    const NUM_BITS_DIV_K_CEIL: usize,
+>(
+    c: &mut Criterion,
+) where
+    C1::Scalar: FromUniformBytes<64>,
+    C1::Base: PrimeFieldBits + FromUniformBytes<64>,
+    C2::Scalar: FromUniformBytes<64>,
+    C2::Base: PrimeFieldBits + FromUniformBytes<64>,
+    C1: CurveAffine + CurveEndoscale,
+    C2: CurveAffine<Base = C1::Scalar> + CurveEndoscale,
+{
+    let bitstring: [bool; NUM_BITS] = (0..NUM_BITS)
+        .map(|_| rand::random::<bool>())
+        .collect::<Vec<_>>()
+        .try_into()
+        .unwrap();
+
+    // Public input of endoscalars in the scalar field corresponding to the bits.
+    let pub_inputs_scalar: Vec<C1::Scalar> = {
+        // Pad bitstring to multiple of K.
+        let bitstring = bitstring
+            .iter()
+            .copied()
+            .chain(std::iter::repeat(false))
+            .take(K * NUM_BITS_DIV_K_CEIL)
+            .collect::<Vec<_>>();
+        bitstring
+            .chunks(K)
+            .map(|chunk| compute_endoscalar_with_acc(Some(C2::Base::ZERO), chunk))
+            .collect()
+    };
+
+    let scalar_circuit = ScalarCircuit::<C2, K, NUM_BITS, NUM_BITS_DIV_K_CEIL> {
+        bitstring: Value::known(bitstring),
+        pub_input_rows: (0..NUM_BITS_DIV_K_CEIL)
+            .collect::<Vec<_>>()
+            .try_into()
+            .unwrap(),
+        _marker: PhantomData,
+    };
+
+    // There needs to be an extra row, compared to testing, since it needs extra room for binding factors and stuff for ZK.
+    let name = format!("endoscale-scalar-{}-{}-{}-{}-{}", type_name::<C1>(), type_name::<C2>(), K, NUM_BITS, NUM_BITS_DIV_K_CEIL);
+    let k = (K + 1) as u32;
+    utilities::circuit_to_csv::<C1>(k, &name, &[&pub_inputs_scalar[..]], scalar_circuit.clone());
+    utilities::bench_circuit::<C1>(c, k, &name, &[&pub_inputs_scalar[..]], scalar_circuit);
+}
+
+fn bench_endoscale(c: &mut Criterion) {
+    bench_endoscale_base::<pallas::Affine, vesta::Affine, 8, 64, 8>(c);
+    bench_endoscale_scalar::<pallas::Affine, vesta::Affine, 8, 64, 8>(c);
+    bench_endoscale_base::<vesta::Affine, pallas::Affine, 8, 64, 8>(c);
+    bench_endoscale_scalar::<vesta::Affine, pallas::Affine, 8, 64, 8>(c);
+
+    bench_endoscale_base::<pallas::Affine, vesta::Affine, 8, 66, 9>(c);
+    bench_endoscale_scalar::<pallas::Affine, vesta::Affine, 8, 66, 9>(c);
+    bench_endoscale_base::<vesta::Affine, pallas::Affine, 8, 66, 9>(c);
+    bench_endoscale_scalar::<vesta::Affine, pallas::Affine, 8, 66, 9>(c);
+}
+
+criterion_group!(benches, bench_endoscale);
+criterion_main!(benches);
diff --git a/halo2_gadgets/src/ecc/chip.rs b/halo2_gadgets/src/ecc/chip.rs
index 7bdad421f..d19bf0bfb 100644
--- a/halo2_gadgets/src/ecc/chip.rs
+++ b/halo2_gadgets/src/ecc/chip.rs
@@ -17,10 +17,10 @@ use halo2curves::{pasta::pallas, CurveAffine};
 use std::convert::TryInto;
 
 pub(super) mod add;
-pub(super) mod add_incomplete;
+pub(crate) mod add_incomplete;
 pub mod constants;
 
-pub(super) mod double;
+pub(crate) mod double;
 pub(crate) mod mul;
 pub(super) mod mul_fixed;
 pub(super) mod witness_point;
@@ -153,7 +153,7 @@ pub struct EccConfig<FixedPoints: super::FixedPoints<pallas::Affine>> {
     pub advices: [Column<Advice>; 10],
 
     /// Incomplete addition
-    add_incomplete: add_incomplete::Config,
+    add_incomplete: add_incomplete::Config<pallas::Affine>,
 
     /// Complete addition
     add: add::Config,
diff --git a/halo2_gadgets/src/ecc/chip/add_incomplete.rs b/halo2_gadgets/src/ecc/chip/add_incomplete.rs
index 815d17237..38dba51a2 100644
--- a/halo2_gadgets/src/ecc/chip/add_incomplete.rs
+++ b/halo2_gadgets/src/ecc/chip/add_incomplete.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::{collections::HashSet, marker::PhantomData};
 
 use super::NonIdentityEccPoint;
 use halo2_proofs::{
@@ -6,10 +6,10 @@ use halo2_proofs::{
     plonk::{Advice, Column, ConstraintSystem, Constraints, Error, Selector},
     poly::Rotation,
 };
-use halo2curves::pasta::pallas;
+use halo2curves::CurveAffine;
 
 #[derive(Clone, Copy, Debug, Eq, PartialEq)]
-pub struct Config {
+pub struct Config<C: CurveAffine> {
     q_add_incomplete: Selector,
     // x-coordinate of P in P + Q = R
     pub x_p: Column<Advice>,
@@ -19,11 +19,13 @@ pub struct Config {
     pub x_qr: Column<Advice>,
     // y-coordinate of Q or R in P + Q = R
     pub y_qr: Column<Advice>,
+
+    _marker: PhantomData<C>,
 }
 
-impl Config {
-    pub(super) fn configure(
-        meta: &mut ConstraintSystem<pallas::Base>,
+impl<C: CurveAffine> Config<C> {
+    pub(crate) fn configure(
+        meta: &mut ConstraintSystem<C::Base>,
         x_p: Column<Advice>,
         y_p: Column<Advice>,
         x_qr: Column<Advice>,
@@ -40,6 +42,7 @@ impl Config {
             y_p,
             x_qr,
             y_qr,
+            _marker: PhantomData,
         };
 
         config.create_gate(meta);
@@ -53,7 +56,7 @@ impl Config {
             .collect()
     }
 
-    fn create_gate(&self, meta: &mut ConstraintSystem<pallas::Base>) {
+    fn create_gate(&self, meta: &mut ConstraintSystem<C::Base>) {
         // https://p.z.cash/halo2-0.1:ecc-incomplete-addition
         meta.create_gate("incomplete addition", |meta| {
             let q_add_incomplete = meta.query_selector(self.q_add_incomplete);
@@ -79,13 +82,13 @@ impl Config {
         });
     }
 
-    pub(super) fn assign_region(
+    pub(crate) fn assign_region(
         &self,
-        p: &NonIdentityEccPoint<pallas::Affine>,
-        q: &NonIdentityEccPoint<pallas::Affine>,
+        p: &NonIdentityEccPoint<C>,
+        q: &NonIdentityEccPoint<C>,
         offset: usize,
-        region: &mut Region<'_, pallas::Base>,
-    ) -> Result<NonIdentityEccPoint<pallas::Affine>, Error> {
+        region: &mut Region<C::Base>,
+    ) -> Result<NonIdentityEccPoint<C>, Error> {
         // Enable `q_add_incomplete` selector
         self.q_add_incomplete.enable(region, offset)?;
 
diff --git a/halo2_gadgets/src/ecc/chip/mul_fixed.rs b/halo2_gadgets/src/ecc/chip/mul_fixed.rs
index 78a244200..349e3f69d 100644
--- a/halo2_gadgets/src/ecc/chip/mul_fixed.rs
+++ b/halo2_gadgets/src/ecc/chip/mul_fixed.rs
@@ -47,7 +47,7 @@ pub struct Config<FixedPoints: super::FixedPoints<pallas::Affine>> {
     // Configuration for `add`
     add_config: add::Config,
     // Configuration for `add_incomplete`
-    add_incomplete_config: add_incomplete::Config,
+    add_incomplete_config: add_incomplete::Config<pallas::Affine>,
     _marker: PhantomData<FixedPoints>,
 }
 
@@ -59,7 +59,7 @@ impl<FixedPoints: super::FixedPoints<pallas::Affine>> Config<FixedPoints> {
         window: Column<Advice>,
         u: Column<Advice>,
         add_config: add::Config,
-        add_incomplete_config: add_incomplete::Config,
+        add_incomplete_config: add_incomplete::Config<pallas::Affine>,
     ) -> Self {
         meta.enable_equality(window);
         meta.enable_equality(u);
@@ -70,7 +70,7 @@ impl<FixedPoints: super::FixedPoints<pallas::Affine>> Config<FixedPoints> {
         // Range-check each window in the running sum decomposition.
         meta.create_gate("range check", |meta| {
             let q_range_check = meta.query_selector(running_sum_config.q_range_check());
-            let word = running_sum_config.window_expr(meta);
+            let word = running_sum_config.window_expr_le(meta);
 
             Constraints::with_selector(
                 q_range_check,
diff --git a/halo2_gadgets/src/endoscale.rs b/halo2_gadgets/src/endoscale.rs
new file mode 100644
index 000000000..e25d6af6c
--- /dev/null
+++ b/halo2_gadgets/src/endoscale.rs
@@ -0,0 +1,82 @@
+//! Gadget for endoscaling.
+use ff::PrimeFieldBits;
+use halo2_proofs::{
+    circuit::{AssignedCell, Layouter, Value},
+    plonk::{Assigned, Error},
+};
+use halo2curves::CurveAffine;
+use std::fmt::Debug;
+
+/// TODO: docs
+pub mod chip;
+pub mod util;
+
+/// Instructions to map bitstrings to and from endoscalars.
+pub trait EndoscaleInstructions<C: CurveAffine>
+where
+    C::Base: PrimeFieldBits,
+{
+    /// A non-identity point.
+    type NonIdentityPoint: Clone + Debug;
+    /// A bitstring up to `MAX_BITSTRING_LENGTH` bits.
+    type Bitstring: Clone + Debug;
+    /// Enumeration of fixed bases used in endoscaling.
+    type FixedBases;
+    /// The maximum number of bits that can be represented by [`Self::Bitstring`].
+    /// When endoscaling with a base, each unique base can only support up to
+    /// `MAX_BITSTRING_LENGTH` bits.
+    const MAX_BITSTRING_LENGTH: usize;
+    /// The number of fixed bases available.
+    const NUM_FIXED_BASES: usize;
+
+    /// Witnesses a slice of bools as a vector of [`Self::Bitstring`]s.
+    fn witness_bitstring(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bits: &[Value<bool>],
+        for_base: bool,
+    ) -> Result<Vec<Self::Bitstring>, Error>;
+
+    /// Computes commitment (Alg 1) to a variable-length bitstring using the endoscaling
+    /// algorithm. Uses the fixed bases defined in [`Self::FixedBases`].
+    ///
+    /// # Panics
+    /// Panics if bitstring.len() exceeds NUM_FIXED_BASES.
+    #[allow(clippy::type_complexity)]
+    fn endoscale_fixed_base(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: Vec<Self::Bitstring>,
+        bases: Vec<Self::FixedBases>,
+    ) -> Result<Vec<Self::NonIdentityPoint>, Error>;
+
+    /// Computes commitment (Alg 1) to a variable-length bitstring using the endoscaling
+    /// algorithm. Uses variable bases witnessed elsewhere in the circuit.
+    ///
+    /// # Panics
+    /// Panics if bitstring.len() exceeds bases.len().
+    #[allow(clippy::type_complexity)]
+    fn endoscale_var_base(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: Vec<Self::Bitstring>,
+        bases: Vec<Self::NonIdentityPoint>,
+    ) -> Result<Vec<Self::NonIdentityPoint>, Error>;
+
+    /// Computes endoscalar (Alg 2) mapping to a variable-length bitstring using
+    /// the endoscaling algorithm.
+    fn compute_endoscalar(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Self::Bitstring,
+    ) -> Result<AssignedCell<Assigned<C::Base>, C::Base>, Error>;
+
+    /// Check that a witnessed bitstring corresponds to a range of endoscalars
+    /// provided as public inputs.
+    fn constrain_bitstring(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Self::Bitstring,
+        pub_input_rows: Vec<usize>,
+    ) -> Result<(), Error>;
+}
diff --git a/halo2_gadgets/src/endoscale/chip.rs b/halo2_gadgets/src/endoscale/chip.rs
new file mode 100644
index 000000000..939a49da6
--- /dev/null
+++ b/halo2_gadgets/src/endoscale/chip.rs
@@ -0,0 +1,485 @@
+/// TODO: docs
+use crate::{ecc::chip::NonIdentityEccPoint, utilities::decompose_running_sum::RunningSumConfig};
+
+use super::EndoscaleInstructions;
+use ff::PrimeFieldBits;
+use halo2_proofs::{
+    arithmetic::CurveAffine,
+    circuit::{AssignedCell, Layouter, Value},
+    plonk::{Advice, Assigned, Column, ConstraintSystem, Error, Instance},
+};
+use halo2curves::{pasta, pluto_eris};
+
+mod alg_1;
+mod alg_2;
+
+use alg_1::Alg1Config;
+use alg_2::Alg2Config;
+
+/// Bitstring used in endoscaling.
+#[derive(Clone, Debug)]
+#[allow(clippy::type_complexity)]
+pub enum Bitstring<F: PrimeFieldBits, const K: usize> {
+    /// TODO: docs
+    Pair(alg_1::Bitstring<F>),
+    /// TODO: docs
+    KBit(alg_2::Bitstring<F, K>),
+}
+
+/// Config used in processing endoscalars.
+#[derive(Clone, Debug)]
+pub struct EndoscaleConfig<C: CurveAffine, const K: usize, const MAX_BITSTRING_LENGTH: usize>
+where
+    C::Base: PrimeFieldBits,
+{
+    /// TODO: docs
+    pub alg_1: Alg1Config<C>,
+    /// TODO: docs
+    pub alg_2: Alg2Config<C, K, MAX_BITSTRING_LENGTH>,
+}
+
+impl<C: CurveAffine, const K: usize, const MAX_BITSTRING_LENGTH: usize>
+    EndoscaleConfig<C, K, MAX_BITSTRING_LENGTH>
+where
+    C::Base: PrimeFieldBits,
+{
+    /// TODO: docs
+    #[allow(dead_code)]
+    #[allow(clippy::too_many_arguments)]
+    pub fn configure(
+        meta: &mut ConstraintSystem<C::Base>,
+        // Advice columns not shared across alg_1 and alg_2
+        advices: [Column<Advice>; 8],
+        // Running sum column shared across alg_1 and alg_2
+        running_sum: Column<Advice>,
+        endoscalars: Column<Instance>,
+    ) -> Self {
+        let running_sum_pairs = {
+            let q_pairs = meta.selector();
+            RunningSumConfig::configure(meta, q_pairs, running_sum)
+        };
+
+        let alg_1 = Alg1Config::configure(
+            meta,
+            (advices[0], advices[1]),
+            (advices[2], advices[3]),
+            (advices[4], advices[5], advices[6], advices[7]),
+            running_sum_pairs,
+        );
+
+        let running_sum_chunks = {
+            let q_chunks = meta.complex_selector();
+            RunningSumConfig::configure(meta, q_chunks, running_sum)
+        };
+
+        let alg_2 = Alg2Config::configure(
+            meta,
+            endoscalars,
+            advices[0],
+            advices[1],
+            running_sum_chunks,
+        );
+
+        Self { alg_1, alg_2 }
+    }
+}
+
+/// TODO: docs
+pub trait CurveEndoscale {
+    /// TODO: docs
+    const MAX_BITSTRING_LENGTH: usize;
+}
+
+impl CurveEndoscale for pasta::EpAffine {
+    const MAX_BITSTRING_LENGTH: usize = 248;
+}
+
+impl CurveEndoscale for pasta::EqAffine {
+    const MAX_BITSTRING_LENGTH: usize = 248;
+}
+
+impl CurveEndoscale for pluto_eris::PlutoAffine {
+    const MAX_BITSTRING_LENGTH: usize = 442;
+}
+
+impl CurveEndoscale for pluto_eris::ErisAffine {
+    const MAX_BITSTRING_LENGTH: usize = 442;
+}
+
+impl<C: CurveAffine + CurveEndoscale, const K: usize, const N: usize> EndoscaleInstructions<C>
+    for EndoscaleConfig<C, K, N>
+where
+    C::Base: PrimeFieldBits,
+{
+    type NonIdentityPoint = NonIdentityEccPoint<C>;
+    type Bitstring = Bitstring<C::Base, K>;
+    type FixedBases = C;
+    const MAX_BITSTRING_LENGTH: usize = C::MAX_BITSTRING_LENGTH;
+    const NUM_FIXED_BASES: usize = N;
+
+    fn witness_bitstring(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bits: &[Value<bool>],
+        for_base: bool,
+    ) -> Result<Vec<Self::Bitstring>, Error> {
+        assert_eq!(bits.len() % 2, 0);
+
+        bits.chunks(Self::MAX_BITSTRING_LENGTH)
+            .map(|bits| {
+                if for_base {
+                    self.alg_1
+                        .witness_bitstring(layouter.namespace(|| "alg 1"), bits)
+                        .map(Bitstring::Pair)
+                } else {
+                    self.alg_2
+                        .witness_bitstring(layouter.namespace(|| "alg 2"), bits)
+                        .map(Bitstring::KBit)
+                }
+            })
+            .collect()
+    }
+
+    #[allow(clippy::type_complexity)]
+    fn endoscale_fixed_base(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: Vec<Self::Bitstring>,
+        bases: Vec<Self::FixedBases>,
+    ) -> Result<Vec<Self::NonIdentityPoint>, Error> {
+        let mut points = Vec::new();
+        for (bitstring, base) in bitstring.iter().zip(bases.iter()) {
+            match bitstring {
+                Bitstring::Pair(bitstring) => {
+                    points.push(self.alg_1.endoscale_fixed_base(layouter, bitstring, base)?)
+                }
+                _ => unreachable!(),
+            }
+        }
+        Ok(points)
+    }
+
+    fn endoscale_var_base(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: Vec<Self::Bitstring>,
+        bases: Vec<Self::NonIdentityPoint>,
+    ) -> Result<Vec<Self::NonIdentityPoint>, Error> {
+        let mut points = Vec::new();
+        for (bitstring, base) in bitstring.iter().zip(bases.iter()) {
+            match bitstring {
+                Bitstring::Pair(bitstring) => {
+                    points.push(self.alg_1.endoscale_var_base(layouter, bitstring, base)?)
+                }
+                _ => unreachable!(),
+            }
+        }
+        Ok(points)
+    }
+
+    fn compute_endoscalar(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Self::Bitstring,
+    ) -> Result<AssignedCell<Assigned<C::Base>, C::Base>, Error> {
+        match bitstring {
+            Bitstring::KBit(bitstring) => self.alg_2.compute_endoscalar(layouter, bitstring),
+            _ => unreachable!(),
+        }
+    }
+
+    fn constrain_bitstring(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Self::Bitstring,
+        pub_input_rows: Vec<usize>,
+    ) -> Result<(), Error> {
+        match bitstring {
+            Bitstring::KBit(bitstring) => {
+                self.alg_2
+                    .constrain_bitstring(layouter, bitstring, pub_input_rows)
+            }
+            _ => unreachable!(),
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::super::util::compute_endoscalar_with_acc;
+    use super::{EndoscaleConfig, EndoscaleInstructions};
+    use crate::ecc::chip::NonIdentityEccPoint;
+    use crate::endoscale::chip::CurveEndoscale;
+
+    use ff::{Field, FromUniformBytes, PrimeFieldBits};
+    use halo2_proofs::poly::commitment::ParamsProver;
+    use halo2_proofs::poly::ipa::commitment::ParamsIPA;
+    use halo2_proofs::{
+        arithmetic::CurveAffine,
+        circuit::{Layouter, SimpleFloorPlanner, Value},
+        plonk::{Advice, Circuit, Column, ConstraintSystem, Error},
+    };
+    use halo2curves::pasta::{pallas, vesta};
+
+    use std::{convert::TryInto, marker::PhantomData};
+
+    struct BaseCircuit<
+        C: CurveAffine,
+        const K: usize,
+        const NUM_BITS: usize,
+        const NUM_BITS_DIV_K_CEIL: usize,
+    >
+    where
+        C::Base: PrimeFieldBits,
+    {
+        bitstring: Value<[bool; NUM_BITS]>,
+        pub_input_rows: [usize; NUM_BITS_DIV_K_CEIL],
+        _marker: PhantomData<C>,
+    }
+
+    impl<
+            C: CurveAffine + CurveEndoscale,
+            const K: usize,
+            const NUM_BITS: usize,
+            const NUM_BITS_DIV_K_CEIL: usize,
+        > Circuit<C::Base> for BaseCircuit<C, K, NUM_BITS, NUM_BITS_DIV_K_CEIL>
+    where
+        C::Base: PrimeFieldBits,
+    {
+        type Config = (EndoscaleConfig<C, K, 248>, Column<Advice>);
+        type FloorPlanner = SimpleFloorPlanner;
+
+        fn without_witnesses(&self) -> Self {
+            Self {
+                bitstring: Value::unknown(),
+                pub_input_rows: self.pub_input_rows,
+                _marker: PhantomData,
+            }
+        }
+
+        fn configure(meta: &mut ConstraintSystem<C::Base>) -> Self::Config {
+            let constants = meta.fixed_column();
+            meta.enable_constant(constants);
+
+            let advices = (0..8)
+                .map(|_| meta.advice_column())
+                .collect::<Vec<_>>()
+                .try_into()
+                .unwrap();
+            let running_sum = meta.advice_column();
+            let endoscalars = meta.instance_column();
+
+            (
+                EndoscaleConfig::configure(meta, advices, running_sum, endoscalars),
+                running_sum,
+            )
+        }
+
+        fn synthesize(
+            &self,
+            config: Self::Config,
+            mut layouter: impl Layouter<C::Base>,
+        ) -> Result<(), Error> {
+            config.0.alg_2.table.load(&mut layouter)?;
+
+            let bitstring = config.0.witness_bitstring(
+                &mut layouter,
+                &self.bitstring.transpose_array(),
+                true,
+            )?;
+
+            // Alg 1 (fixed base)
+            let g_lagrange = ParamsIPA::<C>::new(11).g_lagrange()[0];
+            config
+                .0
+                .endoscale_fixed_base(&mut layouter, bitstring.clone(), vec![g_lagrange])?;
+
+            // Alg 1 (variable base)
+            let g_lagrange = layouter.assign_region(
+                || "g_lagrange",
+                |mut region| {
+                    let x = region.assign_advice(
+                        || "x",
+                        config.1,
+                        0,
+                        || Value::known(*g_lagrange.coordinates().unwrap().x()),
+                    )?;
+                    let y = region.assign_advice(
+                        || "y",
+                        config.1,
+                        1,
+                        || Value::known(*g_lagrange.coordinates().unwrap().y()),
+                    )?;
+
+                    Ok(NonIdentityEccPoint::<C>::from_coordinates_unchecked(
+                        x.into(),
+                        y.into(),
+                    ))
+                },
+            )?;
+            config
+                .0
+                .endoscale_var_base(&mut layouter, bitstring, vec![g_lagrange])?;
+
+            Ok(())
+        }
+    }
+
+    struct ScalarCircuit<
+        C: CurveAffine,
+        const K: usize,
+        const NUM_BITS: usize,
+        const NUM_BITS_DIV_K_CEIL: usize,
+    >
+    where
+        C::Base: PrimeFieldBits,
+    {
+        bitstring: Value<[bool; NUM_BITS]>,
+        pub_input_rows: [usize; NUM_BITS_DIV_K_CEIL],
+        _marker: PhantomData<C>,
+    }
+
+    impl<
+            C: CurveAffine + CurveEndoscale,
+            const K: usize,
+            const NUM_BITS: usize,
+            const NUM_BITS_DIV_K_CEIL: usize,
+        > Circuit<C::Base> for ScalarCircuit<C, K, NUM_BITS, NUM_BITS_DIV_K_CEIL>
+    where
+        C::Base: PrimeFieldBits,
+    {
+        type Config = EndoscaleConfig<C, K, 248>;
+        type FloorPlanner = SimpleFloorPlanner;
+
+        fn without_witnesses(&self) -> Self {
+            Self {
+                bitstring: Value::unknown(),
+                pub_input_rows: self.pub_input_rows,
+                _marker: PhantomData,
+            }
+        }
+
+        fn configure(meta: &mut ConstraintSystem<C::Base>) -> Self::Config {
+            let constants = meta.fixed_column();
+            meta.enable_constant(constants);
+
+            let advices = (0..8)
+                .map(|_| meta.advice_column())
+                .collect::<Vec<_>>()
+                .try_into()
+                .unwrap();
+            let running_sum = meta.advice_column();
+            let endoscalars = meta.instance_column();
+
+            EndoscaleConfig::configure(meta, advices, running_sum, endoscalars)
+        }
+
+        fn synthesize(
+            &self,
+            config: Self::Config,
+            mut layouter: impl Layouter<C::Base>,
+        ) -> Result<(), Error> {
+            config.alg_2.table.load(&mut layouter)?;
+
+            let bitstring = config.witness_bitstring(
+                &mut layouter,
+                &self.bitstring.transpose_array(),
+                false,
+            )?;
+
+            // Alg 2 with lookup
+            config.compute_endoscalar(&mut layouter, &bitstring[0])?;
+
+            // Constrain bitstring
+            config.constrain_bitstring(
+                &mut layouter,
+                &bitstring[0],
+                self.pub_input_rows.to_vec(),
+            )?;
+
+            Ok(())
+        }
+    }
+
+    fn test_endoscale_cycle<
+        BaseCurve: CurveAffine + CurveEndoscale,
+        ScalarCurve: CurveAffine + CurveEndoscale,
+        const K: usize,
+        const NUM_BITS: usize,
+        const NUM_BITS_DIV_K_CEIL: usize,
+    >()
+    where
+        BaseCurve::Base: PrimeFieldBits + FromUniformBytes<64>,
+        ScalarCurve::Base: PrimeFieldBits + FromUniformBytes<64>,
+    {
+        use halo2_proofs::dev::MockProver;
+
+        let bitstring: [bool; NUM_BITS] = (0..NUM_BITS)
+            .map(|_| rand::random::<bool>())
+            .collect::<Vec<_>>()
+            .try_into()
+            .unwrap();
+
+        // Public input of endoscalars in the base field corresponding to the bits.
+        let pub_inputs_base: Vec<BaseCurve::Base> = {
+            // Pad bitstring to multiple of K.
+            let bitstring = bitstring
+                .iter()
+                .copied()
+                .chain(std::iter::repeat(false))
+                .take(K * NUM_BITS_DIV_K_CEIL)
+                .collect::<Vec<_>>();
+            bitstring
+                .chunks(K)
+                .map(|chunk| compute_endoscalar_with_acc(Some(BaseCurve::Base::ZERO), chunk))
+                .collect()
+        };
+
+        // Public input of endoscalars in the scalar field corresponding to the bits.
+        let pub_inputs_scalar: Vec<ScalarCurve::Base> = {
+            // Pad bitstring to multiple of K.
+            let bitstring = bitstring
+                .iter()
+                .copied()
+                .chain(std::iter::repeat(false))
+                .take(K * NUM_BITS_DIV_K_CEIL)
+                .collect::<Vec<_>>();
+            bitstring
+                .chunks(K)
+                .map(|chunk| compute_endoscalar_with_acc(Some(ScalarCurve::Base::ZERO), chunk))
+                .collect()
+        };
+
+        let base_circuit = BaseCircuit::<BaseCurve, K, NUM_BITS, NUM_BITS_DIV_K_CEIL> {
+            bitstring: Value::known(bitstring),
+            pub_input_rows: (0..NUM_BITS_DIV_K_CEIL)
+                .collect::<Vec<_>>()
+                .try_into()
+                .unwrap(),
+            _marker: PhantomData,
+        };
+        let scalar_circuit = ScalarCircuit::<ScalarCurve, K, NUM_BITS, NUM_BITS_DIV_K_CEIL> {
+            bitstring: Value::known(bitstring),
+            pub_input_rows: (0..NUM_BITS_DIV_K_CEIL)
+                .collect::<Vec<_>>()
+                .try_into()
+                .unwrap(),
+            _marker: PhantomData,
+        };
+
+        let base_prover = MockProver::run(11, &base_circuit, vec![pub_inputs_base]).unwrap();
+        base_prover.assert_satisfied();
+
+        let scalar_prover = MockProver::run(11, &scalar_circuit, vec![pub_inputs_scalar]).unwrap();
+        scalar_prover.assert_satisfied();
+    }
+
+    #[test]
+    fn test_endoscale() {
+        test_endoscale_cycle::<pallas::Affine, vesta::Affine, 8, 64, 8>();
+        test_endoscale_cycle::<vesta::Affine, pallas::Affine, 8, 64, 8>();
+
+        test_endoscale_cycle::<pallas::Affine, vesta::Affine, 8, 66, 9>();
+        test_endoscale_cycle::<vesta::Affine, pallas::Affine, 8, 66, 9>();
+    }
+}
diff --git a/halo2_gadgets/src/endoscale/chip/alg_1.rs b/halo2_gadgets/src/endoscale/chip/alg_1.rs
new file mode 100644
index 000000000..955a0e831
--- /dev/null
+++ b/halo2_gadgets/src/endoscale/chip/alg_1.rs
@@ -0,0 +1,547 @@
+use std::iter;
+
+use ff::{Field, PrimeFieldBits, WithSmallOrderMulGroup};
+use halo2_proofs::{
+    arithmetic::CurveAffine,
+    circuit::{Layouter, Region, Value},
+    plonk::{
+        Advice, Assigned, Column, ConstraintSystem, Constraints, Error, Expression, Selector,
+        VirtualCells,
+    },
+    poly::Rotation,
+};
+
+use super::super::util::endoscale_point_pair;
+use crate::{
+    ecc::chip::{add_incomplete, double, NonIdentityEccPoint},
+    utilities::{
+        bool_check,
+        decompose_running_sum::{RunningSum, RunningSumConfig},
+        double_and_add::{DoubleAndAdd, X, Y},
+        le_bits_to_field_elem,
+    },
+};
+
+pub(super) type Bitstring<F> = RunningSum<F, 2>;
+
+/// Config used in Algorithm 1 (endoscaling with a base).
+#[derive(Clone, Debug)]
+pub struct Alg1Config<C: CurveAffine>
+where
+    C::Base: PrimeFieldBits,
+{
+    // Selector for endoscaling checks.
+    q_endoscale_base: Selector,
+    // Selector for the initial check in double-and-add.
+    q_double_and_add_init: Selector,
+    // Selector for stead-state double-and-add.
+    q_double_and_add: Selector,
+    // Selector for the final check in double-and-add.
+    q_double_and_add_final: Selector,
+    // Configuration used for steady-state double-and-add.
+    double_and_add: DoubleAndAdd<C>,
+    // Incomplete point doubling config
+    double: double::Config<C>,
+    // Incomplete point addition config
+    add_incomplete: add_incomplete::Config<C>,
+    // Bases used in endoscaling.
+    base: (Column<Advice>, Column<Advice>),
+    // Bits used in endoscaling. These are in (b_0, b_1) pairs.
+    pair: (Column<Advice>, Column<Advice>),
+    // Configuration for running sum decomposition into pairs of bits.
+    pub(super) running_sum_pairs: RunningSumConfig<C::Base, 2>,
+}
+
+impl<C: CurveAffine> Alg1Config<C>
+where
+    C::Base: PrimeFieldBits,
+{
+    pub(super) fn configure(
+        meta: &mut ConstraintSystem<C::Base>,
+        pair: (Column<Advice>, Column<Advice>),
+        base: (Column<Advice>, Column<Advice>),
+        (x_a, x_p, lambda_1, lambda_2): (
+            Column<Advice>,
+            Column<Advice>,
+            Column<Advice>,
+            Column<Advice>,
+        ),
+        running_sum_pairs: RunningSumConfig<C::Base, 2>,
+    ) -> Self {
+        meta.enable_equality(base.0);
+        meta.enable_equality(base.1);
+
+        let q_endoscale_base = meta.selector();
+
+        // Initial double-and-add gate
+        let q_double_and_add_init = meta.selector();
+        // Steady-state double-and-add gate
+        let q_double_and_add = meta.complex_selector();
+        // Final double-and-add gate
+        let q_double_and_add_final = meta.complex_selector();
+
+        let double_and_add = DoubleAndAdd::configure(
+            meta,
+            x_a,
+            x_p,
+            lambda_1,
+            lambda_2,
+            &|meta: &mut VirtualCells<<C as CurveAffine>::Base>| {
+                let q_double_and_add = meta.query_selector(q_double_and_add);
+                let q_double_and_add_final = meta.query_selector(q_double_and_add_final);
+                q_double_and_add + q_double_and_add_final
+            },
+            &|meta: &mut VirtualCells<<C as CurveAffine>::Base>| {
+                meta.query_selector(q_double_and_add)
+            },
+        );
+
+        // This madness is to save on copy constraints:
+        //
+        // # Patterns
+        // Patterns of one function flowing into another.
+        // The goal is to line up regions so there are no copy constraints required.
+        //
+        // ## Patterns in alg_1
+        //
+        // sub endoscale_base_init(base):
+        //     sum = add_incomplete(base, phi_p) -- output in (x_qr, y_qr)
+        //     return double(base)               -- input  in (x_p, y_p)
+        //
+        // We want x_a:      double_and_add.x_a      = add_incomplete.x_qr = double.x_p
+        // and     x_p:      double_and_add.x_p      = add_incomplete.y_qr = double.y_p
+        // and     lambda_1: double_and_add.lambda_1 = add_incomplete.x_p  = double.x_r
+        // and     lambda_2: double_and_add.lambda_2 = add_incomplete.y_p  = double.y_r
+
+        let add_incomplete = add_incomplete::Config::configure(meta, lambda_1, lambda_2, x_a, x_p);
+        let double = double::Config::configure(meta, x_a, x_p, lambda_1, lambda_2);
+
+        meta.enable_equality(add_incomplete.x_p);
+        meta.enable_equality(add_incomplete.y_p);
+
+        meta.create_gate("init double-and-add", |meta| {
+            let selector = meta.query_selector(q_double_and_add_init);
+            // The accumulator is initialised to [2](φ(P) + P).
+
+            // Check that the x-coordinate of the inputs to the incomplete addition
+            // are related as x, ζx.
+            // The y-coordinate is copy-constrained.
+            let incomplete_add_x_check = {
+                let x_p = meta.query_advice(add_incomplete.x_p, Rotation::prev());
+                let phi_x_p = meta.query_advice(add_incomplete.x_qr, Rotation::prev());
+
+                x_p * C::Base::ZETA - phi_x_p
+            };
+
+            // Check that the initial accumulator's y-coordinate `y_a` is consistent
+            // with the one derived internally by `double_and_add`.
+            let init_y_a_check = {
+                let y_a = meta.query_advice(double.y_r(), Rotation::cur());
+                let derived_y_a = double_and_add.y_a(meta, Rotation::next());
+
+                y_a - derived_y_a
+            };
+
+            Constraints::with_selector(
+                selector,
+                [
+                    ("incomplete_add_x_check", incomplete_add_x_check),
+                    ("init_y_a_check", init_y_a_check),
+                ],
+            )
+        });
+
+        meta.create_gate("final double-and-add", |meta| {
+            // Check that the final witnessed y_a is consistent with the y_a
+            // derived internally by `double_and_add`.
+            let selector = meta.query_selector(q_double_and_add_final);
+
+            // x_{A,i}
+            let x_a_prev = meta.query_advice(double_and_add.x_a(), Rotation::cur());
+            // x_{A,i-1}
+            let x_a_cur = meta.query_advice(double_and_add.x_a(), Rotation::next());
+            // λ_{2,i}
+            let lambda2_prev = meta.query_advice(double_and_add.lambda_2(), Rotation::cur());
+            let y_a_prev = double_and_add.y_a(meta, Rotation::cur());
+
+            let lhs = lambda2_prev * (x_a_prev - x_a_cur);
+            let rhs = {
+                let y_a_final = meta.query_advice(lambda_1, Rotation::next());
+                y_a_prev + y_a_final
+            };
+
+            Constraints::with_selector(selector, [lhs - rhs])
+        });
+
+        /*
+            The accumulator is initialised to [2](φ(P) + P) = (init_x, init_y).
+
+            | pair.0 | pair.1 |    base.0     |    base.1     |  double_and_add.x_a   | double_and_add.lambda_1|   <- column names
+            ---------------------------------------------------------------------------------------------------|
+            |  b_0   |   b_1  |  init endo_x  |  init endo_y  |      init acc_x       |       init acc_y       |
+            |  ...   |   ...  |       ...     |      ...      |          ...          | (acc_y not witnessed)  |
+            | b_{n-2}| b_{n-1}|  final endo_x | final endo_y  |      final acc_x      |       final acc_y      |
+
+            (0, 0) -> (P_x, -P_y)
+            (0, 1) -> (ζ * P_x, -P_y)
+            (1, 0) -> (P_x, P_y)
+            (1, 1) -> (ζ * P_x, P_y)
+        */
+        meta.create_gate("Endoscale base", |meta| {
+            let q_endoscale_base = meta.query_selector(q_endoscale_base);
+
+            // Pair of bits from the decomposition.
+            let b_0 = meta.query_advice(pair.0, Rotation::cur());
+            let b_1 = meta.query_advice(pair.1, Rotation::cur());
+
+            // Boolean-constrain b_0, b_1
+            let b_0_check = bool_check(b_0.clone());
+            let b_1_check = bool_check(b_1.clone());
+
+            // Check that `b_0, b_1` are consistent with the running sum decomposition.
+            let decomposition_check = {
+                let word = b_0.clone() + Expression::Constant(C::Base::from(2)) * b_1.clone();
+                let expected_word = running_sum_pairs.window_expr_be(meta);
+
+                word - expected_word
+            };
+
+            let y_check = {
+                let endo_y = double_and_add.y_p(meta, Rotation::cur());
+                let p_y = meta.query_advice(base.1, Rotation::cur());
+                // If the first bit is set, check that endo_y = P_y
+                let b0_set = b_0.clone() * (endo_y.clone() - p_y.clone());
+
+                // If the first bit is not set, check that endo_y = -P_y
+                let not_b0 = Expression::Constant(C::Base::ONE) - b_0;
+                let b0_not_set = not_b0 * (endo_y + p_y);
+
+                b0_set + b0_not_set
+            };
+            let x_check = {
+                let endo_x = meta.query_advice(double_and_add.x_p(), Rotation::cur());
+                let p_x = meta.query_advice(base.0, Rotation::cur());
+                // If the second bit is set, check that endo_x = ζ * P_x
+                let zeta = Expression::Constant(C::Base::ZETA);
+                let b1_set = b_1.clone() * (endo_x.clone() - zeta * p_x.clone());
+
+                // If the second bit is not set, check that endo_x = P_x
+                let not_b1 = Expression::Constant(C::Base::ONE) - b_1;
+                let b1_not_set = not_b1 * (endo_x - p_x);
+
+                b1_set + b1_not_set
+            };
+
+            Constraints::with_selector(
+                q_endoscale_base,
+                iter::empty()
+                    .chain(iter::once(("b_0_check", b_0_check)))
+                    .chain(iter::once(("b_1_check", b_1_check)))
+                    .chain(iter::once(("decomposition_check", decomposition_check)))
+                    .chain(iter::once(("x_check", x_check)))
+                    .chain(iter::once(("y_check", y_check))),
+            )
+        });
+
+        Self {
+            q_endoscale_base,
+            q_double_and_add_init,
+            q_double_and_add,
+            q_double_and_add_final,
+            double_and_add,
+            double,
+            add_incomplete,
+            base,
+            pair,
+            running_sum_pairs,
+        }
+    }
+
+    pub(super) fn witness_bitstring(
+        &self,
+        mut layouter: impl Layouter<C::Base>,
+        bits: &[Value<bool>],
+    ) -> Result<Bitstring<C::Base>, Error> {
+        let alpha = {
+            let bits = Value::<Vec<_>>::from_iter(bits.to_vec());
+            bits.map(|b| le_bits_to_field_elem(&b))
+        };
+        let word_num_bits = bits.len();
+        let num_windows = word_num_bits / 2;
+
+        layouter.assign_region(
+            || "witness bitstring",
+            |mut region| {
+                let offset = 0;
+
+                self.running_sum_pairs.witness_decompose(
+                    &mut region,
+                    offset,
+                    alpha,
+                    true,
+                    word_num_bits,
+                    num_windows,
+                )
+            },
+        )
+    }
+
+    pub(super) fn endoscale_fixed_base(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Bitstring<C::Base>,
+        base: &C,
+    ) -> Result<NonIdentityEccPoint<C>, Error> {
+        layouter.assign_region(
+            || "endoscale with fixed base",
+            |mut region| {
+                let offset = 0;
+
+                let base = {
+                    // Assign base_x
+                    let x = region.assign_advice_from_constant(
+                        || "base_x",
+                        self.add_incomplete.x_p,
+                        offset,
+                        Assigned::from(*base.coordinates().unwrap().x()),
+                    )?;
+
+                    // Assign base_y
+                    let y = region.assign_advice_from_constant(
+                        || "base_y",
+                        self.add_incomplete.y_p,
+                        offset,
+                        Assigned::from(*base.coordinates().unwrap().y()),
+                    )?;
+                    NonIdentityEccPoint::from_coordinates_unchecked(x, y)
+                };
+
+                self.endoscale_base_inner(&mut region, offset, &base, bitstring)
+            },
+        )
+    }
+
+    pub(super) fn endoscale_var_base(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Bitstring<C::Base>,
+        base: &NonIdentityEccPoint<C>,
+    ) -> Result<NonIdentityEccPoint<C>, Error> {
+        layouter.assign_region(
+            || "endoscale with variable base",
+            |mut region| {
+                let offset = 0;
+
+                let base = {
+                    let x = base.x().copy_advice(
+                        || "base_x",
+                        &mut region,
+                        self.add_incomplete.x_p,
+                        offset,
+                    )?;
+                    let y = base.y().copy_advice(
+                        || "base_y",
+                        &mut region,
+                        self.add_incomplete.y_p,
+                        offset,
+                    )?;
+                    NonIdentityEccPoint::from_coordinates_unchecked(x.into(), y.into())
+                };
+
+                self.endoscale_base_inner(&mut region, offset, &base, bitstring)
+            },
+        )
+    }
+}
+
+impl<C: CurveAffine> Alg1Config<C>
+where
+    C::Base: WithSmallOrderMulGroup<3> + PrimeFieldBits,
+{
+    #[allow(clippy::type_complexity)]
+    fn endoscale_base_init(
+        &self,
+        region: &mut Region<'_, C::Base>,
+        mut offset: usize,
+        base: &NonIdentityEccPoint<C>,
+    ) -> Result<(usize, (X<C::Base>, Y<C::Base>)), Error> {
+        // The accumulator is initialised to [2](φ(P) + P)
+        self.q_double_and_add_init.enable(region, offset + 1)?;
+
+        // Incomplete addition of (φ(P) + P), where φ(P) = φ((x, y)) = (ζx, y)
+        let sum = {
+            let zeta_x = base.x().value().map(|p| Assigned::from(*p * C::Base::ZETA));
+            // double_and_add.x_a()
+            let zeta_x =
+                region.assign_advice(|| "ζ * x", self.add_incomplete.x_qr, offset, || zeta_x)?;
+            let phi_p = NonIdentityEccPoint::from_coordinates_unchecked(zeta_x, base.y().into());
+
+            self.add_incomplete
+                .assign_region(base, &phi_p, offset, region)?
+        };
+        offset += 1;
+
+        let acc = self
+            .double
+            .assign_region(&sum, offset, region)
+            .map(|acc| (X(acc.x().into()), Y(acc.y().value().copied().into())))?;
+        offset += 1;
+
+        Ok((offset, acc))
+    }
+
+    #[allow(clippy::type_complexity)]
+    fn endoscale_base_main(
+        &self,
+        region: &mut Region<'_, C::Base>,
+        mut offset: usize,
+        mut acc: (X<C::Base>, Y<C::Base>),
+        base: &NonIdentityEccPoint<C>,
+        // Bitstring decomposed into 2-bit windows using a running sum.
+        // This internally enables the `q_range_check` selector, which is
+        // used in the "Endoscale base" gate.
+        bitstring: &Bitstring<C::Base>,
+    ) -> Result<(usize, (X<C::Base>, Y<C::Base>)), Error> {
+        let running_sum_len = bitstring.zs().len();
+        // Copy in running sum
+        for (idx, z) in bitstring.zs().iter().rev().enumerate() {
+            z.copy_advice(
+                || format!("z[{:?}]", running_sum_len - idx),
+                region,
+                self.running_sum_pairs.z(),
+                offset + idx,
+            )?;
+        }
+
+        // Enable selector for steady-state double-and-add on all but last row
+        let num_pairs = bitstring.pairs().len();
+        for idx in 0..(num_pairs - 1) {
+            self.q_double_and_add.enable(region, offset + idx)?;
+        }
+
+        for (pair_idx, pair) in bitstring.pairs().iter().rev().enumerate() {
+            self.q_endoscale_base.enable(region, offset)?;
+
+            // Assign base
+            base.x()
+                .copy_advice(|| "base_x", region, self.base.0, offset)?;
+            base.y()
+                .copy_advice(|| "base_y", region, self.base.1, offset)?;
+
+            // Assign b_0
+            let b_0 = pair.map(|pair| pair[0]);
+            region.assign_advice(
+                || format!("pair_idx: {}, b_0", num_pairs - pair_idx),
+                self.pair.0,
+                offset,
+                || b_0.map(|b| C::Base::from(b as u64)),
+            )?;
+
+            // Assign b_1
+            let b_1 = pair.map(|pair| pair[1]);
+            region.assign_advice(
+                || format!("pair_idx: {}, b_1", num_pairs - pair_idx),
+                self.pair.1,
+                offset,
+                || b_1.map(|b| C::Base::from(b as u64)),
+            )?;
+
+            let endo = {
+                let base = base.point();
+                let endo = pair
+                    .zip(base)
+                    .map(|(pair, base)| endoscale_point_pair::<C>(pair, base).unwrap());
+
+                let endo_x = endo.map(|endo| *endo.coordinates().unwrap().x());
+                let endo_y = endo.map(|endo| *endo.coordinates().unwrap().y());
+
+                (endo_x, endo_y)
+            };
+
+            // Add endo to acc.
+            acc = self.double_and_add.assign_region(
+                region,
+                offset,
+                (endo.0.map(|v| v.into()), endo.1.map(|v| v.into())),
+                acc,
+            )?;
+
+            offset += 1;
+        }
+
+        Ok((offset, acc))
+    }
+
+    #[allow(clippy::type_complexity)]
+    fn endoscale_base_final(
+        &self,
+        region: &mut Region<'_, C::Base>,
+        offset: usize,
+        (x, y): (X<C::Base>, Y<C::Base>),
+    ) -> Result<NonIdentityEccPoint<C>, Error> {
+        self.q_double_and_add_final.enable(region, offset - 1)?;
+        let y = region.assign_advice(
+            || "final y_a",
+            self.double_and_add.lambda_1(),
+            offset,
+            || *y,
+        )?;
+        Ok(NonIdentityEccPoint::from_coordinates_unchecked(x.0, y))
+    }
+
+    fn endoscale_base_inner(
+        &self,
+        region: &mut Region<'_, C::Base>,
+        offset: usize,
+        base: &NonIdentityEccPoint<C>,
+        bitstring: &Bitstring<C::Base>,
+    ) -> Result<NonIdentityEccPoint<C>, Error> {
+        let (offset, acc) = self.endoscale_base_init(region, offset, base)?;
+
+        let (offset, (x, y)) = self.endoscale_base_main(region, offset, acc, base, bitstring)?;
+
+        let res = self.endoscale_base_final(region, offset, (x, y))?;
+
+        #[cfg(test)]
+        {
+            use crate::endoscale::util::endoscale_point;
+            let point = base.point();
+            let expected_res = bitstring
+                .bitstring()
+                .zip(point)
+                .map(|(bits, point)| endoscale_point(&bits, point));
+            res.point()
+                .zip(expected_res)
+                .map(|(res, expected_res)| assert_eq!(res, expected_res));
+        }
+
+        Ok(res)
+    }
+}
+
+impl<F: PrimeFieldBits> Bitstring<F> {
+    fn pairs(&self) -> Vec<Value<[bool; 2]>> {
+        self.windows()
+            .iter()
+            .map(|window| {
+                window.map(|window| {
+                    window
+                        .to_le_bits()
+                        .into_iter()
+                        .take(2)
+                        .collect::<Vec<_>>()
+                        .try_into()
+                        .unwrap()
+                })
+            })
+            .collect()
+    }
+
+    #[cfg(test)]
+    fn bitstring(&self) -> Value<Vec<bool>> {
+        let num_bits = self.num_bits();
+        self.zs()[0]
+            .value()
+            .map(|v| v.to_le_bits().iter().by_vals().take(num_bits).collect())
+    }
+}
diff --git a/halo2_gadgets/src/endoscale/chip/alg_2.rs b/halo2_gadgets/src/endoscale/chip/alg_2.rs
new file mode 100644
index 000000000..6487283c0
--- /dev/null
+++ b/halo2_gadgets/src/endoscale/chip/alg_2.rs
@@ -0,0 +1,597 @@
+use ff::{Field, PrimeFieldBits, WithSmallOrderMulGroup};
+use halo2_proofs::{
+    arithmetic::CurveAffine,
+    circuit::{AssignedCell, Layouter, Value},
+    plonk::{
+        Advice, Assigned, Column, ConstraintSystem, Constraints, Error, Expression, Instance,
+        Selector, TableColumn,
+    },
+    poly::Rotation,
+};
+
+use crate::{
+    endoscale::util::compute_endoscalar_with_acc,
+    utilities::{
+        decompose_running_sum::{RunningSum, RunningSumConfig},
+        i2lebsp, le_bits_to_field_elem,
+        lookup_range_check::LookupRangeCheckConfig,
+    },
+};
+use std::marker::PhantomData;
+
+#[derive(Clone, Debug)]
+pub struct Bitstring<F: PrimeFieldBits, const K: usize> {
+    running_sum: RunningSum<F, K>,
+    pad_len: usize,
+}
+
+impl<F: PrimeFieldBits, const K: usize> Bitstring<F, K> {
+    #[cfg(test)]
+    fn bitstring(&self) -> Value<Vec<bool>> {
+        let num_bits = self.running_sum.num_bits() - self.pad_len;
+        self.running_sum.zs()[0]
+            .value()
+            .map(|v| v.to_le_bits().iter().by_vals().take(num_bits).collect())
+    }
+}
+
+/// Configuration for endoscalar table.
+#[derive(Copy, Clone, Debug)]
+pub struct TableConfig<F: WithSmallOrderMulGroup<3>, const K: usize> {
+    pub(crate) bits: TableColumn,
+    pub(crate) endoscalar: TableColumn,
+    _marker: PhantomData<F>,
+}
+
+impl<F: WithSmallOrderMulGroup<3>, const K: usize> TableConfig<F, K> {
+    #[allow(dead_code)]
+    pub fn configure(meta: &mut ConstraintSystem<F>) -> Self {
+        TableConfig {
+            bits: meta.lookup_table_column(),
+            endoscalar: meta.lookup_table_column(),
+            _marker: PhantomData,
+        }
+    }
+
+    #[allow(dead_code)]
+    pub fn load(&self, layouter: &mut impl Layouter<F>) -> Result<(), Error> {
+        layouter.assign_table(
+            || "endoscalar_map",
+            |mut table| {
+                for index in 0..(1 << K) {
+                    table.assign_cell(
+                        || "bits",
+                        self.bits,
+                        index,
+                        || Value::known(F::from(index as u64)),
+                    )?;
+                    table.assign_cell(
+                        || "endoscalar",
+                        self.endoscalar,
+                        index,
+                        || {
+                            Value::known(compute_endoscalar_with_acc(
+                                Some(F::ZERO),
+                                &i2lebsp::<K>(index as u64),
+                            ))
+                        },
+                    )?;
+                }
+                Ok(())
+            },
+        )
+    }
+}
+
+/// Config used in Algorithm 2 (endoscaling in the field).
+#[derive(Clone, Debug)]
+pub struct Alg2Config<C: CurveAffine, const K: usize, const MAX_BITSTRING_LENGTH: usize>
+where
+    C::Base: PrimeFieldBits,
+{
+    // Selector enabling a lookup in the (bitstring, endoscalar) table.
+    q_lookup: Selector,
+    // Selector to initialise Alg 2 endoscaling.
+    q_init: Selector,
+    // Selector for Alg 2 endoscaling.
+    q_endoscale_scalar: Selector,
+    // Selector checking that partial chunks are correctly shifted.
+    q_partial_chunk: Selector,
+    // Public inputs are provided as endoscalars. Each endoscalar corresponds
+    // to a K-bit chunk.
+    endoscalars: Column<Instance>,
+    // An additional advice column where endoscalar values are copied and used
+    // in the lookup argument.
+    endoscalars_copy: Column<Advice>,
+    // Advice column where accumulator is witnessed.
+    acc: Column<Advice>,
+    // Configuration for running sum decomposition into K-bit chunks.
+    pub(super) running_sum_chunks: RunningSumConfig<C::Base, K>,
+    // Table mapping words to their corresponding endoscalars.
+    pub table: TableConfig<C::Base, K>,
+    // Configuration for lookup range check of partial chunks.
+    lookup_range_check: LookupRangeCheckConfig<C::Base, K>,
+}
+
+impl<C: CurveAffine, const K: usize, const MAX_BITSTRING_LENGTH: usize>
+    Alg2Config<C, K, MAX_BITSTRING_LENGTH>
+where
+    C::Base: PrimeFieldBits,
+{
+    pub(super) fn configure(
+        meta: &mut ConstraintSystem<C::Base>,
+        endoscalars: Column<Instance>,
+        endoscalars_copy: Column<Advice>,
+        acc: Column<Advice>,
+        running_sum_chunks: RunningSumConfig<C::Base, K>,
+    ) -> Self {
+        meta.enable_equality(endoscalars);
+        meta.enable_equality(endoscalars_copy);
+        meta.enable_equality(acc);
+
+        let table = TableConfig::configure(meta);
+        let lookup_range_check = LookupRangeCheckConfig::configure(meta, acc, table.bits);
+
+        let config = Self {
+            q_lookup: meta.complex_selector(),
+            q_init: meta.selector(),
+            q_endoscale_scalar: meta.selector(),
+            q_partial_chunk: meta.selector(),
+            endoscalars,
+            endoscalars_copy,
+            acc,
+            running_sum_chunks,
+            table,
+            lookup_range_check,
+        };
+
+        let two_pow_k_div2 = Expression::Constant(C::Base::from(1u64 << (K / 2)));
+
+        meta.create_gate("Endoscale scalar with lookup", |meta| {
+            let q_endoscale_scalar = meta.query_selector(config.q_endoscale_scalar);
+            let endo = meta.query_advice(config.endoscalars_copy, Rotation::cur());
+            let acc = meta.query_advice(config.acc, Rotation::cur());
+            let next_acc = meta.query_advice(config.acc, Rotation::next());
+
+            // Check that next_acc = acc * 2^{K/2} + endo
+            let expected_next_acc = acc * two_pow_k_div2.clone() + endo;
+
+            Constraints::with_selector(q_endoscale_scalar, [next_acc - expected_next_acc])
+        });
+
+        meta.lookup("Endoscaling lookup", |meta| {
+            let q_lookup = meta.query_selector(config.q_lookup);
+            let word = config.running_sum_chunks.window_expr_be(meta);
+            let endo = meta.query_advice(config.endoscalars_copy, Rotation::cur());
+
+            let neg_q_lookup = Expression::Constant(C::Base::ONE) - q_lookup.clone();
+            let default_endo = {
+                let val = compute_endoscalar_with_acc(Some(C::Base::ZERO), &[false; K]);
+                Expression::Constant(val)
+            };
+
+            let endo_expr = q_lookup.clone() * endo + neg_q_lookup * default_endo;
+
+            vec![(q_lookup * word, table.bits), (endo_expr, table.endoscalar)]
+        });
+
+        meta.create_gate("Partial chunk", |meta| {
+            let q_partial_chunk = meta.query_selector(config.q_partial_chunk);
+
+            // z_pen - z_last * 2^K
+            let expected_chunk = config.running_sum_chunks.window_expr_be(meta);
+            let padded_chunk = meta.query_advice(config.acc, Rotation::cur());
+            let chunk_check = expected_chunk - padded_chunk;
+
+            let padded_endoscalar = meta.query_advice(config.endoscalars_copy, Rotation::cur());
+            // 2^{K' / 2}
+            let two_pow_k_prime_div2 = meta.query_advice(config.endoscalars_copy, Rotation::next());
+            // shift = 2^{K'/2} - 2^{K/2}
+            let shift = two_pow_k_prime_div2.clone() - two_pow_k_div2.clone();
+            let shifted_endoscalar = padded_endoscalar - shift;
+
+            // Initialise the accumulator to 2 * (ζ + 1).
+            let init_acc = Expression::Constant((C::Base::ZETA + C::Base::ONE).double());
+            let acc_1 = meta.query_advice(config.acc, Rotation::next());
+            // Check that acc_1 = init_acc * 2^{K' / 2} + shifted_endoscalar
+            let expected_acc_1 = init_acc * two_pow_k_prime_div2 + shifted_endoscalar;
+            let acc_check = acc_1 - expected_acc_1;
+
+            Constraints::with_selector(
+                q_partial_chunk,
+                [("acc_check", acc_check), ("chunk_check", chunk_check)],
+            )
+        });
+
+        meta.create_gate("Init chunk", |meta| {
+            let q_init = meta.query_selector(config.q_init);
+
+            let endoscalar = meta.query_advice(config.endoscalars_copy, Rotation::cur());
+
+            let init_acc = meta.query_advice(config.endoscalars_copy, Rotation::next());
+            let acc_1 = meta.query_advice(config.acc, Rotation::next());
+            // Check that acc_1 = init_acc * 2^{K / 2} + endoscalar
+            let expected_acc_1 = init_acc * two_pow_k_div2 + endoscalar;
+
+            Constraints::with_selector(q_init, [("acc_check", acc_1 - expected_acc_1)])
+        });
+
+        config
+    }
+
+    pub(super) fn witness_bitstring(
+        &self,
+        mut layouter: impl Layouter<C::Base>,
+        bits: &[Value<bool>],
+    ) -> Result<Bitstring<C::Base, K>, Error> {
+        let word_num_bits = bits.len();
+        let pad_len = (K - (word_num_bits % K)) % K;
+
+        // Right-pad bitstring to a multiple of K if needed
+        let mut bits: Value<Vec<bool>> = bits.iter().copied().collect();
+        if pad_len > 0 {
+            bits = bits.map(|bits| {
+                let padding = std::iter::repeat(false).take(pad_len);
+                bits.iter().copied().chain(padding).collect()
+            });
+        }
+
+        let alpha = bits.map(|b| le_bits_to_field_elem(&b));
+
+        let running_sum = layouter.assign_region(
+            || "witness bitstring",
+            |mut region| {
+                let offset = 0;
+
+                let num_windows = (word_num_bits + pad_len) / K;
+                self.running_sum_chunks.witness_decompose(
+                    &mut region,
+                    offset,
+                    alpha,
+                    true,
+                    word_num_bits + pad_len,
+                    num_windows,
+                )
+            },
+        )?;
+
+        Ok(Bitstring {
+            running_sum,
+            pad_len,
+        })
+    }
+
+    pub(super) fn compute_endoscalar(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Bitstring<C::Base, K>,
+    ) -> Result<AssignedCell<Assigned<C::Base>, C::Base>, Error> {
+        let pad_len = bitstring.pad_len;
+        let num_bits = bitstring.running_sum.num_bits() - pad_len;
+        // num_bits must be an even number not greater than MAX_BITSTRING_LENGTH.
+        assert!(num_bits <= MAX_BITSTRING_LENGTH);
+
+        // The bitstring will be broken into K-bit chunks with the first chunk
+        // being a padded k_prime-bit partial chunk
+        let k_prime = K - pad_len;
+
+        // Interstitial running sum values
+        let zs = bitstring.running_sum.zs();
+
+        let init_acc = if pad_len > 0 {
+            self.init_partial_chunk(
+                layouter.namespace(|| "init partial chunk"),
+                &(zs[zs.len() - 1]).clone().into(),
+                &(zs[zs.len() - 2]).clone().into(),
+                k_prime,
+            )?
+        } else {
+            self.init_full_chunk(
+                layouter.namespace(|| "init full chunk"),
+                &(zs[zs.len() - 1]).clone().into(),
+                &(zs[zs.len() - 2]).clone().into(),
+            )?
+        };
+
+        layouter.assign_region(
+            || "Endoscale scalar using bitstring (lookup optimisation)",
+            |mut region| {
+                let offset = 0;
+
+                // Copy the running sum
+                let running_sum_len = zs.len();
+                for (idx, z) in zs.iter().rev().skip(1).enumerate() {
+                    z.copy_advice(
+                        || format!("z[{:?}]", running_sum_len - idx + 1),
+                        &mut region,
+                        self.running_sum_chunks.z(),
+                        offset + idx,
+                    )?;
+                }
+
+                // Copy in the accumulator
+                init_acc.copy_advice(|| "copy acc", &mut region, self.acc, offset)?;
+
+                let mut acc = init_acc.clone();
+                // For each chunk, lookup the (chunk, endoscalar) pair and add
+                // it to the accumulator.
+                for (idx, chunk) in bitstring
+                    .running_sum
+                    .windows()
+                    .iter()
+                    .rev()
+                    .skip(1)
+                    .enumerate()
+                {
+                    self.q_lookup.enable(&mut region, offset + idx)?;
+                    self.q_endoscale_scalar.enable(&mut region, offset + idx)?;
+
+                    let endoscalar = chunk.map(|c| {
+                        compute_endoscalar_with_acc(
+                            Some(C::Base::ZERO),
+                            &c.to_le_bits().iter().by_vals().take(K).collect::<Vec<_>>(),
+                        )
+                    });
+                    // Witness endoscalar.
+                    region.assign_advice(
+                        || format!("Endoscalar for chunk {}", running_sum_len - idx),
+                        self.endoscalars_copy,
+                        offset + idx,
+                        || endoscalar,
+                    )?;
+
+                    // Bitshift the accumulator by {K / 2} and add to endoscalar.
+                    let acc_val = acc.value().zip(endoscalar).map(|(&acc, endo)| {
+                        let two_pow_k_div2 = C::Base::from(1 << (K / 2));
+                        acc * two_pow_k_div2 + endo
+                    });
+                    acc = region.assign_advice(
+                        || format!("Endoscalar for chunk {}", running_sum_len - idx),
+                        self.acc,
+                        offset + idx + 1,
+                        || acc_val,
+                    )?;
+                }
+
+                #[cfg(test)]
+                {
+                    use crate::endoscale::util::compute_endoscalar;
+                    let bitstring = bitstring.bitstring();
+                    let expected_acc: Value<C::Base> = bitstring.map(|b| compute_endoscalar(&b));
+                    acc.value()
+                        .zip(expected_acc)
+                        .map(|(&acc, expected_acc)| assert_eq!(acc.evaluate(), expected_acc));
+                }
+
+                Ok(acc)
+            },
+        )
+    }
+
+    /// The first chunk is handled differently if it is padded:
+    ///
+    ///    |      z       |      acc     | endoscalars_copy  | q_partial | q_lookup|
+    ///    -------------------------------------------------------------------------
+    ///    |     z_last   | padded_chunk | padded_endoscalar |     1     |    1    |
+    ///    |     z_pen    |    acc_1     |      2^{K'/2}     |     0     |    0    |
+    ///
+    fn init_partial_chunk(
+        &self,
+        mut layouter: impl Layouter<C::Base>,
+        z_last: &AssignedCell<Assigned<C::Base>, C::Base>,
+        z_pen: &AssignedCell<Assigned<C::Base>, C::Base>,
+        k_prime: usize,
+    ) -> Result<AssignedCell<Assigned<C::Base>, C::Base>, Error> {
+        // Derive the padded chunk c_last = z_pen - z_last * 2^K
+        let padded_chunk = z_pen
+            .value()
+            .zip(z_last.value())
+            .map(|(z_pen, z_last)| *z_pen - *z_last * C::Base::from(1u64 << K));
+
+        // Range-constrain the padded chunk to `k_prime` bits.
+        let padded_chunk = self.lookup_range_check.witness_short_check(
+            layouter.namespace(|| format!("Check that padded_chunk is {} bits", k_prime)),
+            padded_chunk.evaluate(),
+            k_prime,
+        )?;
+
+        layouter.assign_region(
+            || "Init partial chunk",
+            |mut region| {
+                let offset = 0;
+
+                // Enable q_partial_chunk on offset 0
+                self.q_partial_chunk.enable(&mut region, offset)?;
+                // Enable q_lookup on offset 0
+                self.q_lookup.enable(&mut region, offset)?;
+
+                // Copy z_last
+                z_last.copy_advice(
+                    || "copy z_last",
+                    &mut region,
+                    self.running_sum_chunks.z(),
+                    offset,
+                )?;
+                // Copy z_pen
+                z_pen.copy_advice(
+                    || "copy z_pen",
+                    &mut region,
+                    self.running_sum_chunks.z(),
+                    offset + 1,
+                )?;
+                // Copy padded_chunk
+                padded_chunk.copy_advice(|| "copy padded chunk", &mut region, self.acc, offset)?;
+
+                // Witness the endoscalar corresponding to the padded chunk.
+                let padded_endoscalar = padded_chunk.value().map(|v| {
+                    let bitstring = v.to_le_bits().iter().by_vals().take(K).collect::<Vec<_>>();
+                    compute_endoscalar_with_acc(Some(C::Base::ZERO), &bitstring)
+                });
+                region.assign_advice(
+                    || "padded endoscalar",
+                    self.endoscalars_copy,
+                    offset,
+                    || padded_endoscalar,
+                )?;
+
+                // Load the value 2^{K'/2} from constant.
+                let two_pow_k_prime_div2: Assigned<C::Base> =
+                    C::Base::from(1u64 << (k_prime / 2)).into();
+                region.assign_advice_from_constant(
+                    || "2^{K'/2}",
+                    self.endoscalars_copy,
+                    offset + 1,
+                    two_pow_k_prime_div2,
+                )?;
+
+                // Bitshift the accumulator by {K' / 2} bits and add to adjusted endoscalar.
+                let acc: Value<Assigned<C::Base>> = padded_endoscalar.map(|padded_endoscalar| {
+                    let two_pow_k_div2: Assigned<C::Base> = C::Base::from(1 << (K / 2)).into();
+                    let padded_endoscalar: Assigned<C::Base> = padded_endoscalar.into();
+
+                    // shift = 2^{K'/2} - 2^{K/2}
+                    let shift = two_pow_k_prime_div2 - two_pow_k_div2;
+                    let actual_endoscalar = padded_endoscalar - shift;
+                    let init_acc: Assigned<C::Base> =
+                        (C::Base::ZETA + C::Base::ONE).double().into();
+                    init_acc * two_pow_k_prime_div2 + actual_endoscalar
+                });
+                region.assign_advice(
+                    || "acc = init_acc * 2^{K'/2} + actual_endoscalar",
+                    self.acc,
+                    offset + 1,
+                    || acc,
+                )
+            },
+        )
+    }
+
+    /// If it is not padded, we lookup the endoscalar directly using the derived chunk:
+    ///
+    ///    |      z       |      acc     | endoscalars_copy  |  q_lookup |  q_init |
+    ///    -------------------------------------------------------------------------
+    ///    |     z_last   |              |    endoscalar     |     1     |    1    |
+    ///    |     z_pen    |     acc_1    |    init_acc       |     0     |    0    |
+    ///
+    fn init_full_chunk(
+        &self,
+        mut layouter: impl Layouter<C::Base>,
+        z_last: &AssignedCell<Assigned<C::Base>, C::Base>,
+        z_pen: &AssignedCell<Assigned<C::Base>, C::Base>,
+    ) -> Result<AssignedCell<Assigned<C::Base>, C::Base>, Error> {
+        layouter.assign_region(
+            || "Init full chunk",
+            |mut region| {
+                let offset = 0;
+
+                // Enable q_lookup on offset 0
+                self.q_lookup.enable(&mut region, offset)?;
+                // Enable q_init on offset 0
+                self.q_init.enable(&mut region, offset)?;
+
+                // Copy z_last
+                z_last.copy_advice(
+                    || "copy z_last",
+                    &mut region,
+                    self.running_sum_chunks.z(),
+                    offset,
+                )?;
+                // Copy z_pen
+                z_pen.copy_advice(
+                    || "copy z_pen",
+                    &mut region,
+                    self.running_sum_chunks.z(),
+                    offset + 1,
+                )?;
+
+                // Initialise the accumulator to 2 * (ζ + 1).
+                let init_acc = Assigned::from((C::Base::ZETA + C::Base::ONE).double());
+                region.assign_advice_from_constant(
+                    || "initialise acc",
+                    self.endoscalars_copy,
+                    offset + 1,
+                    init_acc,
+                )?;
+
+                // Derive chunk c_last = z_pen - z_last * 2^K
+                let chunk = z_pen
+                    .value()
+                    .zip(z_last.value())
+                    .map(|(z_pen, z_last)| *z_pen - *z_last * C::Base::from(1u64 << K));
+
+                // Witness the endoscalar corresponding to the chunk.
+                let endoscalar: Value<Assigned<C::Base>> = chunk
+                    .map(|v| {
+                        let bitstring = v
+                            .evaluate()
+                            .to_le_bits()
+                            .iter()
+                            .by_vals()
+                            .take(K)
+                            .collect::<Vec<_>>();
+                        compute_endoscalar_with_acc(Some(C::Base::ZERO), &bitstring)
+                    })
+                    .into();
+                region.assign_advice(
+                    || "actual endoscalar",
+                    self.endoscalars_copy,
+                    offset,
+                    || endoscalar,
+                )?;
+
+                // Bitshift the accumulator by {K / 2} and add to endoscalar.
+                let acc: Value<Assigned<C::Base>> = endoscalar.map(|endoscalar| {
+                    let two_pow_k_div2 = C::Base::from(1 << (K / 2));
+                    init_acc * two_pow_k_div2 + endoscalar
+                });
+                region.assign_advice(
+                    || "acc = init_acc * 2^{K/2} + endoscalar",
+                    self.acc,
+                    offset + 1,
+                    || acc,
+                )
+            },
+        )
+    }
+
+    pub(super) fn constrain_bitstring(
+        &self,
+        layouter: &mut impl Layouter<C::Base>,
+        bitstring: &Bitstring<C::Base, K>,
+        pub_input_rows: Vec<usize>,
+    ) -> Result<(), Error> {
+        layouter.assign_region(
+            || "Recover bitstring from endoscalars",
+            |mut region| {
+                let offset = 0;
+
+                // Copy the running sum.
+                let running_sum_len = bitstring.running_sum.zs().len();
+                for (idx, z) in bitstring.running_sum.zs().iter().rev().enumerate() {
+                    z.copy_advice(
+                        || format!("z[{:?}]", running_sum_len - idx),
+                        &mut region,
+                        self.running_sum_chunks.z(),
+                        offset + idx,
+                    )?;
+                }
+
+                // For each chunk, lookup the (chunk, endoscalar) pair.
+                for (idx, pub_input_row) in pub_input_rows.iter().rev().enumerate() {
+                    self.q_lookup.enable(&mut region, offset + idx)?;
+
+                    // Copy endoscalar from given row on instance column
+                    region.assign_advice_from_instance(
+                        || format!("Endoscalar at row {:?}", pub_input_row),
+                        self.endoscalars,
+                        *pub_input_row,
+                        self.endoscalars_copy,
+                        offset + idx,
+                    )?;
+                }
+
+                Ok(())
+            },
+        )
+    }
+}
diff --git a/halo2_gadgets/src/endoscale/util.rs b/halo2_gadgets/src/endoscale/util.rs
new file mode 100644
index 000000000..5c4437fd4
--- /dev/null
+++ b/halo2_gadgets/src/endoscale/util.rs
@@ -0,0 +1,210 @@
+//! Primitives used in endoscaling.
+
+use ff::WithSmallOrderMulGroup;
+use group::Group;
+use halo2curves::CurveAffine;
+
+use subtle::CtOption;
+
+/// Maps a pair of bits to a multiple of a scalar using endoscaling.
+pub(crate) fn compute_endoscalar_pair<F: WithSmallOrderMulGroup<3>>(bits: [bool; 2]) -> F {
+    // [2 * bits.0 - 1]
+    let mut scalar = F::from(bits[0].into()).double() - F::ONE;
+
+    if bits[1] {
+        scalar *= F::ZETA;
+    }
+
+    scalar
+}
+
+/// Maps a K-bit bitstring to a scalar.
+///
+/// This corresponds to Algorithm 1 from [BGH2019], where `F` corresponds to $F_q$, the
+/// scalar field of $P$. Where Algorithm 1 computes $Acc = [scalar] P$, this function
+/// computes `scalar`.
+///
+/// [BGH2019]: https://eprint.iacr.org/2019/1021.pdf
+#[allow(dead_code)]
+pub(crate) fn compute_endoscalar<F: WithSmallOrderMulGroup<3>>(bits: &[bool]) -> F {
+    compute_endoscalar_with_acc(None, bits)
+}
+
+/// Maps a K-bit bitstring to a scalar.
+///
+/// This function takes an optional accumulator which can be initialised to some value.
+/// This is convenient when chunking the bitstring being endoscaled is a partial chunk
+/// in some larger bitstring.
+///
+/// # Panics
+/// Panics if there is an odd number of bits.
+pub fn compute_endoscalar_with_acc<F: WithSmallOrderMulGroup<3>>(
+    acc: Option<F>,
+    bits: &[bool],
+) -> F {
+    assert_eq!(bits.len() % 2, 0);
+
+    let mut acc = acc.unwrap_or_else(|| (F::ZETA + F::ONE).double());
+
+    for j in (0..(bits.len() / 2)).rev() {
+        let pair = [bits[2 * j], bits[2 * j + 1]];
+        let endo = compute_endoscalar_pair::<F>(pair);
+        acc = acc.double();
+        acc += endo;
+    }
+    acc
+}
+
+/// Maps a pair of bits to a multiple of a base using endoscaling.
+///
+/// # Panics
+/// Panics if the base is the identity.
+pub(crate) fn endoscale_point_pair<C: CurveAffine>(bits: [bool; 2], base: C) -> CtOption<C> {
+    assert!(!bool::from(base.to_curve().is_identity()));
+
+    let mut base = {
+        let base = base.coordinates();
+        (*base.unwrap().x(), *base.unwrap().y())
+    };
+
+    if !bits[0] {
+        base.1 = -base.1;
+    }
+
+    if bits[1] {
+        base.0 *= C::Base::ZETA;
+    }
+
+    C::from_xy(base.0, base.1)
+}
+
+/// Maps a K-bit bitstring to a multiple of a given base.
+///
+/// This is Algorithm 1 from [BGH2019](https://eprint.iacr.org/2019/1021.pdf).
+///
+/// # Panics
+/// Panics if the base is the identity.
+/// Panics if there is an odd number of bits.
+#[cfg(test)]
+pub(crate) fn endoscale_point<C: CurveAffine>(bits: &[bool], base: C) -> C {
+    use group::Curve;
+
+    assert_eq!(bits.len() % 2, 0);
+    assert!(!bool::from(base.to_curve().is_identity()));
+
+    // Initialise accumulator to [2](φ(P) + P)
+    let mut acc = (base.to_curve() + base * C::Scalar::ZETA).double();
+
+    for j in (0..(bits.len() / 2)).rev() {
+        let pair = [bits[2 * j], bits[2 * j + 1]];
+        let endo = endoscale_point_pair::<C>(pair, base);
+        acc = acc.double();
+        acc += endo.unwrap();
+    }
+
+    acc.to_affine()
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use group::{prime::PrimeCurveAffine, Curve};
+    use halo2curves::pasta::pallas;
+    use rand::{random, rngs::OsRng};
+
+    #[test]
+    fn test_alg1_alg2() {
+        let base = pallas::Point::random(OsRng);
+        let num_bits = 128;
+        let bits: Vec<_> = std::iter::repeat(random::<bool>()).take(num_bits).collect();
+
+        let endoscalar: pallas::Scalar = compute_endoscalar(&bits);
+        let endoscaled_base = endoscale_point(&bits, base.to_affine());
+
+        assert_eq!(base * endoscalar, endoscaled_base.to_curve());
+    }
+
+    fn shift_padded_endo<F: WithSmallOrderMulGroup<3>, const K: usize>(
+        padded_endo: F,
+        k_prime: usize,
+    ) -> F {
+        //   (1 - 2^{(K - K')/2}) * 2^{K'/2}
+        // = 2^{K'/2} - 2^{K/2}
+        let shift = F::from(1 << (k_prime / 2)) - F::from(1 << (K / 2));
+        padded_endo - shift
+    }
+
+    /// Test that shifting the endoscalar of the padded chunk recovers the
+    /// same result as directly endoscaling the original chunk.
+    fn endo_partial_chunk<F: WithSmallOrderMulGroup<3>, const K: usize>(k_prime: usize) {
+        assert!(k_prime > 0);
+        assert!(k_prime < K);
+        let bits: Vec<_> = std::iter::repeat(random::<bool>()).take(k_prime).collect();
+
+        let padding = std::iter::repeat(false).take(K - k_prime);
+        let padded_bits: Vec<_> = bits.iter().copied().chain(padding).collect();
+        let padded_endo = compute_endoscalar_with_acc(Some(F::ZERO), &padded_bits);
+
+        let endo = shift_padded_endo::<_, K>(padded_endo, k_prime);
+
+        assert_eq!(endo, compute_endoscalar_with_acc(Some(F::ZERO), &bits));
+    }
+
+    #[test]
+    /// Test that shifting the endoscalar of the padded chunk recovers the
+    /// same result as directly endoscaling the original chunk.
+    fn test_endo_partial_chunk() {
+        endo_partial_chunk::<pallas::Base, 10>(2);
+        endo_partial_chunk::<pallas::Base, 10>(4);
+        endo_partial_chunk::<pallas::Base, 10>(6);
+        endo_partial_chunk::<pallas::Base, 10>(8);
+    }
+
+    fn endo_chunk<F: WithSmallOrderMulGroup<3>, const K: usize>(num_bits: usize) {
+        let bits: Vec<_> = std::iter::repeat(random::<bool>()).take(num_bits).collect();
+        let endoscalar_by_pair = compute_endoscalar(&bits);
+
+        let pad_len = (K - (num_bits % K)) % K;
+        let two_pow_k_div_two = F::from(1u64 << (K / 2));
+
+        // Pad bits from the right with `pad_len` zeros
+        let bits: Vec<_> = bits
+            .iter()
+            .copied()
+            .chain(std::iter::repeat(false).take(pad_len))
+            .collect();
+
+        // Initialise accumulator
+        let mut acc = (F::ZETA + F::ONE).double();
+
+        let mut chunks = bits.chunks(K).rev();
+        let last_chunk = chunks.next().unwrap();
+        let last_endo = compute_endoscalar_with_acc(Some(F::ZERO), last_chunk);
+
+        // If the last chunk was padded, adjust it for a shift.
+        if pad_len > 0 {
+            let k_prime = K - pad_len;
+            let two_pow_k_prime_div_two = F::from(1u64 << (k_prime / 2));
+            let shifted_endo = shift_padded_endo::<_, K>(last_endo, k_prime);
+            acc = acc * two_pow_k_prime_div_two + shifted_endo;
+        } else {
+            acc = acc * two_pow_k_div_two + last_endo;
+        };
+
+        for chunk in chunks.rev() {
+            let endo = compute_endoscalar_with_acc(Some(F::ZERO), chunk);
+            acc = acc * two_pow_k_div_two + endo;
+        }
+
+        assert_eq!(acc, endoscalar_by_pair);
+    }
+
+    #[test]
+    fn test_endo_chunk() {
+        endo_chunk::<pallas::Base, 2>(8);
+        endo_chunk::<pallas::Base, 4>(8);
+        endo_chunk::<pallas::Base, 6>(8);
+        endo_chunk::<pallas::Base, 8>(8);
+        endo_chunk::<pallas::Base, 10>(8);
+    }
+}
diff --git a/halo2_gadgets/src/lib.rs b/halo2_gadgets/src/lib.rs
index bde37ec8c..4080d8410 100644
--- a/halo2_gadgets/src/lib.rs
+++ b/halo2_gadgets/src/lib.rs
@@ -24,6 +24,7 @@
 #![deny(unsafe_code)]
 
 pub mod ecc;
+pub mod endoscale;
 pub mod poseidon;
 #[cfg(feature = "unstable")]
 #[cfg_attr(docsrs, doc(cfg(feature = "unstable")))]
diff --git a/halo2_gadgets/src/utilities.rs b/halo2_gadgets/src/utilities.rs
index c2d1db00d..2f0fdcc63 100644
--- a/halo2_gadgets/src/utilities.rs
+++ b/halo2_gadgets/src/utilities.rs
@@ -238,6 +238,37 @@ pub fn i2lebsp<const NUM_BITS: usize>(int: u64) -> [bool; NUM_BITS] {
     gen_const_array(|mask: usize| (int & (1 << mask)) != 0)
 }
 
+/// The field element represented by an L-bit little-endian bitstring.
+///
+/// # Panics
+///
+/// Panics if the bitstring is longer than `F::NUM_BITS`.
+pub fn le_bits_to_field_elem<F: PrimeFieldBits>(bits: &[bool]) -> F {
+    assert!(bits.len() as u32 <= F::NUM_BITS);
+    let bits: Vec<_> = bits
+        .iter()
+        .chain(std::iter::repeat(&false).take(bits.len() % 8))
+        .collect();
+    let pad_len = 32 - bits.len() / 8;
+
+    let mut repr = F::Repr::default();
+    let view = repr.as_mut();
+
+    let bytes = bits
+        .chunks(8)
+        .map(|bits| {
+            bits.iter()
+                .enumerate()
+                .fold(0u8, |acc, (i, b)| acc + if **b { 1 << i } else { 0 })
+        })
+        .chain(std::iter::repeat(0u8).take(pad_len));
+    for (byte, repr) in bytes.zip(view.iter_mut()) {
+        *repr = byte
+    }
+
+    F::from_repr(repr).unwrap()
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/halo2_gadgets/src/utilities/decompose_running_sum.rs b/halo2_gadgets/src/utilities/decompose_running_sum.rs
index 69271066c..a530fc075 100644
--- a/halo2_gadgets/src/utilities/decompose_running_sum.rs
+++ b/halo2_gadgets/src/utilities/decompose_running_sum.rs
@@ -31,7 +31,7 @@ use halo2_proofs::{
 use std::marker::PhantomData;
 
 /// The running sum $[z_0, ..., z_W]$. If created in strict mode, $z_W = 0$.
-#[derive(Debug)]
+#[derive(Clone, Debug)]
 pub struct RunningSum<F: PrimeFieldBits, const WINDOW_NUM_BITS: usize> {
     zs: Vec<AssignedCell<F, F>>,
     num_bits: usize,
@@ -80,6 +80,11 @@ impl<F: PrimeFieldBits, const WINDOW_NUM_BITS: usize> RunningSumConfig<F, WINDOW
         self.q_range_check
     }
 
+    /// Returns the `z` advice column of this [`RunningSumConfig`].
+    pub(crate) fn z(&self) -> Column<Advice> {
+        self.z
+    }
+
     /// `perm` MUST include the advice column `z`.
     ///
     /// # Side-effects
@@ -106,15 +111,28 @@ impl<F: PrimeFieldBits, const WINDOW_NUM_BITS: usize> RunningSumConfig<F, WINDOW
         config
     }
 
-    /// Expression for a window
+    /// Expression for a window when the running sum is arranged in increasing order
+    /// z_0, z_1, ..., z_{W}, where z_0 is the original field element that was decomposed.
+    ///
     ///    z_i = 2^{K}⋅z_{i + 1} + k_i
     /// => k_i = z_i - 2^{K}⋅z_{i + 1}
-    pub(crate) fn window_expr(&self, meta: &mut VirtualCells<'_, F>) -> Expression<F> {
+    pub(crate) fn window_expr_le(&self, meta: &mut VirtualCells<'_, F>) -> Expression<F> {
         let z_cur = meta.query_advice(self.z, Rotation::cur());
         let z_next = meta.query_advice(self.z, Rotation::next());
         z_cur - z_next * F::from(1 << WINDOW_NUM_BITS)
     }
 
+    /// Expression for a window when the running sum is arranged in decreasing order
+    /// z_{W}, z_{W-1}, ..., z_{0}, where z_0 is the original field element that was decomposed.
+    ///
+    ///    z_i = 2^{K}⋅z_{i + 1} + k_i
+    /// => k_i = z_i - 2^{K}⋅z_{i + 1}
+    pub(crate) fn window_expr_be(&self, meta: &mut VirtualCells<'_, F>) -> Expression<F> {
+        let z_cur = meta.query_advice(self.z, Rotation::cur());
+        let z_next = meta.query_advice(self.z, Rotation::next());
+        z_next - z_cur * F::from(1 << WINDOW_NUM_BITS)
+    }
+
     /// Decompose a field element alpha that is witnessed in this helper.
     ///
     /// `strict` = true constrains the final running sum to be zero, i.e.
diff --git a/halo2_gadgets/src/utilities/lookup_range_check.rs b/halo2_gadgets/src/utilities/lookup_range_check.rs
index 5ee032498..19c13ac92 100644
--- a/halo2_gadgets/src/utilities/lookup_range_check.rs
+++ b/halo2_gadgets/src/utilities/lookup_range_check.rs
@@ -298,7 +298,7 @@ impl<F: PrimeFieldBits, const K: usize> LookupRangeCheckConfig<F, K> {
     pub fn copy_short_check(
         &self,
         mut layouter: impl Layouter<F>,
-        element: AssignedCell<F, F>,
+        element: &AssignedCell<F, F>,
         num_bits: usize,
     ) -> Result<(), Error> {
         assert!(num_bits < K);
@@ -309,7 +309,7 @@ impl<F: PrimeFieldBits, const K: usize> LookupRangeCheckConfig<F, K> {
                 let element =
                     element.copy_advice(|| "element", &mut region, self.running_sum, 0)?;
 
-                self.short_range_check(&mut region, element, num_bits)
+                self.short_range_check(&mut region, &element, num_bits)
             },
         )
     }
@@ -333,7 +333,7 @@ impl<F: PrimeFieldBits, const K: usize> LookupRangeCheckConfig<F, K> {
                 let element =
                     region.assign_advice(|| "Witness element", self.running_sum, 0, || element)?;
 
-                self.short_range_check(&mut region, element.clone(), num_bits)?;
+                self.short_range_check(&mut region, &element, num_bits)?;
 
                 Ok(element)
             },
@@ -346,7 +346,7 @@ impl<F: PrimeFieldBits, const K: usize> LookupRangeCheckConfig<F, K> {
     fn short_range_check(
         &self,
         region: &mut Region<'_, F>,
-        element: AssignedCell<F, F>,
+        element: &AssignedCell<F, F>,
         num_bits: usize,
     ) -> Result<(), Error> {
         // Enable lookup for `element`, to constrain it to 10 bits.
diff --git a/halo2_proofs/src/circuit.rs b/halo2_proofs/src/circuit.rs
index f15205e43..cc8e72b13 100644
--- a/halo2_proofs/src/circuit.rs
+++ b/halo2_proofs/src/circuit.rs
@@ -138,6 +138,16 @@ impl<F: Field> AssignedCell<Assigned<F>, F> {
     }
 }
 
+impl<F: Field> From<AssignedCell<F, F>> for AssignedCell<Assigned<F>, F> {
+    fn from(assigned_cell: AssignedCell<F, F>) -> Self {
+        AssignedCell {
+            value: assigned_cell.value.map(|val| val.into()),
+            cell: assigned_cell.cell,
+            _marker: Default::default(),
+        }
+    }
+}
+
 impl<V: Clone, F: Field> AssignedCell<V, F>
 where
     for<'v> Assigned<F>: From<&'v V>,
diff --git a/halo2_proofs/src/poly/ipa/commitment.rs b/halo2_proofs/src/poly/ipa/commitment.rs
index b36fc9e82..8d3b17d07 100644
--- a/halo2_proofs/src/poly/ipa/commitment.rs
+++ b/halo2_proofs/src/poly/ipa/commitment.rs
@@ -34,6 +34,13 @@ pub struct ParamsIPA<C: CurveAffine> {
     pub(crate) u: C,
 }
 
+impl<C: CurveAffine> ParamsIPA<C> {
+    /// Needed for endoscaling testing in `halo2_gadgets`
+    pub fn g_lagrange(&self) -> &[C] {
+        &self.g_lagrange
+    }
+}
+
 /// Concrete IPA commitment scheme
 #[derive(Debug)]
 pub struct IPACommitmentScheme<C: CurveAffine> {