Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Asset Migration Algorithm #2618

Merged
merged 31 commits into from
Apr 22, 2021
Merged

Conversation

jonathanknowles
Copy link
Member

@jonathanknowles jonathanknowles commented Apr 19, 2021

Issue Number

ADP-839

Overview

This PR adds an algorithm for performing migrations of multi-asset UTxO sets from one wallet to another.

It consists of four main parts:

  1. Module Migration, which provides a top-level public API, defined in terms of concrete wallet types.
  2. Module Migration.Planning, which contains an algorithm for planning migrations at a high
    level. It determines how to partition a UTxO set into entries of different types, and in which order to add UTxO entries to selections.
  3. Module Migration.Selection, which provides functions for incrementally constructing a
    selection to be included in a migration plan. (A selection is the basis for an individual transaction.)
  4. The TxConstraints type, which provides an abstract cost model for transactions, allowing parts of a transaction can be costed individually. This allows the incremental cost of extending a selection to be calculated without having to recalculate the cost of the entire selection.

Performance

For pure ada wallets, we can process around 30,000 UTxO entries per second.

We can process around 3,000 UTxO entries per second with wallets of the following composition:

Percentage UTxO Entry Type
10% Pure ada entries below the minimum ada quantity
40% Pure ada entries with at least the minimum ada quantity
40% Multi-asset token bundles with precisely the minimum ada quantity
10% Multi-asset token bundles with more than the minimum ada quantity

Design

UTxO partitioning

The algorithm starts by partitioning the UTxO set into three types of UTxO entry:

Type Definition
supporter an entry that is capable of paying for itself
freerider an entry that is not capable of paying for itself
ignorable a dust ada entry that should not be included because its value is less than the marginal cost of an input.

Adding entries to transactions

The algorithm uses a supporter entry to initialize each transaction, but then gives priority to adding freerider entries until the transaction is full. It adds a supporter entry only when there's not enough ada available to pay for an additional freerider entry.

extending a selection

Creating outputs within transactions

The algorithm minimizes the number of outputs for each transaction (and thus minimizes the fee required) by merging together all value provided by the inputs. Due to the nature of our transaction constraints (which limit the size of an output), the algorithm usually produces transactions with between 1 and 4 outputs.

Requirements

The algorithm in this PR is designed to satisfy the following requirements:

  • REQ-1
    The algorithm must succeed at selecting all pure ada entries greater than txBaseFee + minAdaQuantity + marginalFeeForInputAndOutput.
  • REQ-2
    The algorithm must succeed at selecting all multi-asset entries whose ada quantities are greater than or equal to txBaseFee + minAdaQuantity + marginalFeeForInputAndOutput.
  • REQ-3
    The algorithm should make a best effort to select entries whose ada quantities are less than txBaseFee + minAdaQuantity + marginalFeeForInputAndOutput, but is permitted to omit such entries if it cannot find a viable strategy to select them.
  • REQ-4
    The algorithm must produce T: a set of transactions for which fees are balanced, and R: the remainder of the UTxO set that could not be selected, such that the inputs of T and the entries in R are disjoint, and such that the union of inputs in T and entries in R is equal to the original UTxO set.
  • REQ-5
    The algorithm must preserve the balance of every non-ada asset, so that for any given asset a, the total quantity of asset a in the selected inputs is exactly equal to the total quantity of asset a in the generated outputs.
  • REQ-6
    The algorithm must not produce transactions with dependencies between them; it should be possible to broadcast the generated transactions in any order.
  • REQ-7
    The algorithm must not produce outputs with token quantities that exceed the maximum allowed in an output (2^64 − 1).
  • REQ-8
    The algorithm must not produce outputs whose serialized length exceeds the maximum allowed length (4000 bytes).
  • REQ-9
    The algorithm must not produce transactions that exceed the transaction size limit (16384 bytes).
  • REQ-10
    The algorithm must move the reward balance in its entirety.
  • REQ-11
    The algorithm must be deterministic: if run twice on the same UTxO set, it must generate exactly the same migration plan. (This is necessary to support an API where a non-hardware wallet user can generate a plan, inspect the plan, and then initiate a migration without having to resubmit the plan.)

Non-Requirements

  • NON-REQ-1
    Preserving the UTxO distribution.

The `TxConstraints` type provides an abstract cost model for
transactions, allowing parts of a transaction can be costed
individually.
@jonathanknowles
Copy link
Member Author

jonathanknowles commented Apr 19, 2021

Had a review call with @Anviking, where we identified a number of things to add.

To add to this PR:

  • Write a comment for Selection.balance, to explain its purpose, what it expects and what it guarantees.
    (Fixed in c6eb01a)
  • Write a comment for Selection.minimizeFee to explain the motivation for coalescing most of the ada into one of the outputs (rather than spreading it evenly across all outputs).
    (Fixed in c6eb01a)
  • Write a comment for Selection.addValueToOutputs to explain its purpose, what it expects, what it guarantees.
    (Fixed in c6eb01a)
  • In the property test for Planning.createPlan, use QuickCheck.tabulate instead of QuickCheck.label for whenever there is a range of values, but a shared title.
    (Fixed by b2186bd)
  • In the property test for Planning.createPlan, instead of reporting how many of each category were not selected, report how many were selected, thus making the results easier to interpret (avoiding a double negative).
    (Fixed by b2186bd)
  • Add a property test to show that uncategorizeUTxOEntries . categorizeUTxOEntries == id.
    (Fixed by cd78eae)
  • Add a property test for Migration.createPlan which checks that utxoProvided == union utxoNotSelected utxoSelected.
    (Fixed by 152d957)
  • Add a property test for Migration.createPlan which checks that Set.empty == intersect utxoNotSelected utxoSelected.
    (Fixed by eee3f2e)
  • Add a property test for Migration.createPlan to verify that it gives us the same result as calling the inner Planning.createPlan.
    (Fixed by eee3f2e)
  • In the property test for Selection.create, check that the fee has been minimized.
    (Fixed in adf7dd0)
  • In the property test for Selection.extend, check that the fee has been minimized.
    (Fixed in adf7dd0)

To add to the subsequent PR (integration with the Shelley transaction layer):

  • Test all the properties again, but with real Shelley transaction constraints rather than mock constraints. This will require us to make the properties reusable from the cardano-wallet-shelley test suite. (We can use the same approach that we have for the Gen modules.)
  • Add an API integration test to check that calling getMigrationInfo gives the same result when run twice.
  • In the Shelley transaction layer tests, we should check that the fee of each selection is not significantly different from calling "computeFee . estimateSize" (a pre-existing function in the Shelley.Transaction module).
  • Think of a way to build a fail-safe sanity check for fee minimization.

@jonathanknowles jonathanknowles self-assigned this Apr 19, 2021
@jonathanknowles jonathanknowles marked this pull request as ready for review April 19, 2021 12:06
This module contains functions for incrementally constructing a
selection to be included in a migration plan.

A selection is the basis for an individual transaction.
This module contains an algorithm for planning migrations at a high
level. It determines how to partition the UTxO set into entries of
different types, and in which order to add UTxO entries to selections.
This module provides a public API for planning migrations.
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/multi-asset-migration branch from 87b53d6 to 87c58c4 Compare April 19, 2021 12:30
jonathanknowles added a commit that referenced this pull request Apr 20, 2021
We demonstrate that `uncategorize . categorize == id`.

In response to review feedback:

#2618 (comment)
This change makes it possible to pretty-print any value that is an
instance of `Show` in property test failures.
jonathanknowles added a commit that referenced this pull request Apr 20, 2021
We demonstrate that `uncategorize . categorize == id`.

In response to review feedback:

#2618 (comment)
jonathanknowles added a commit that referenced this pull request Apr 20, 2021
This produces a clearer statistical summary.

In addition, this commit replaces "percentage not selected" with
"percentage selected", which is arguably easier to interpret.

In response to review feedback:

#2618 (comment)
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/multi-asset-migration branch from 7bbf602 to 47377b0 Compare April 20, 2021 06:12
We explicitly specify the full range of a `Word8` by using `choose`,
which guarantees to generate a uniform distribution.

Collisions were seen in test runs. After this change is applied,
collisions disappear.
We demonstrate that `uncategorize . categorize == id`.

In response to review feedback:

#2618 (comment)
This makes the overall test suite run time more reasonable.
This produces a clearer statistical summary.

In addition, this commit replaces "percentage not selected" with
"percentage selected", which is arguably easier to interpret.

In response to review feedback:

#2618 (comment)
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/multi-asset-migration branch from 47377b0 to e9a3c32 Compare April 20, 2021 06:26
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/multi-asset-migration branch from e9a3c32 to 152d957 Compare April 20, 2021 06:28
jonathanknowles added a commit that referenced this pull request Apr 20, 2021
These functions manipulate concrete wallet types, so it makes more
sense for them to be located in the top-level `Migration` module, which
deals with concrete wallet types.

Conversely, both `Migration.Planning` and `Migration.Selection` use
abstract types for transaction input identifiers.

In response to review feedback:

#2618 (comment)
jonathanknowles added a commit that referenced this pull request Apr 20, 2021
This also fixes a similar typo in the ordinary coin selection algorithm.

In response to review feedback:

#2618 (comment)
This also fixes a similar typo in the ordinary coin selection algorithm.

In response to review feedback:

#2618 (comment)
@jonathanknowles
Copy link
Member Author

jonathanknowles commented Apr 21, 2021

Just a through: You said you didn't just want to use the real constraints, I believe for the sake of coverage, shrinking etc. Couldn't you simply have generators which 50% use real constraints and 50% use mock constraints?
Although I guess then you would never have coverage reports using the real parameters alone, which might be useful.

I agree with you that we really need to get coverage with the real parameters alone.

I think it's clear that we need the following:

  1. A way to get fast iteration and feedback when adjusting something in Primitive.Migration.
  2. A way to test with the real Shelley parameters.

But the separation between cardano-wallet-core and cardano-wallet (Shelley) means that our primary test suite for Primitive.Migration cannot depend on the real parameters, unless we make our mock parameters identical. But making them identical might not be desirable: we sometimes want to use parameters that are smaller (for example), to enable tests to run faster, or to test edge cases.

I think there's a good compromise solution here:

  1. Define reusable generators and properties in terms of TxConstraints.
  2. Call these properties with genMockTxConstraints in the cardano-wallet-core test suite (enabling rapid feedback.)
  3. Call these properties with shelleyTxConstraints in the cardano-wallet test suite (giving confidence that everything works with the real constraints.)

From the above, we already have parts 1 and 2. The next PR can add part 3 (though it might need us to move some of the generators and properties to a shared Gen module).

@jonathanknowles jonathanknowles force-pushed the jonathanknowles/multi-asset-migration branch from c6eb01a to 7a3912d Compare April 21, 2021 05:44
Copy link
Member

@Anviking Anviking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stylistic comments; will review more

This commit adds two utility functions, and uses them to improve the
output of test failures:

  * report:

    Adds a named variable to the counterexample output of a property.
    In the event of a test failure, a pretty-printed representation of
    the variable is shown as part of the counterexample output.

  * verify:

    Adds a named test condition to a property. In the event of a test
    failure, the names of all the conditions that failed are added to
    the counterexample output.

Sample output:

```
    test/unit/Cardano/Wallet/Primitive/Migration/SelectionSpec.hs:146:9:
    1) Minimizing fees, prop_minimizeFeeStep
         Falsified (after 1 test):
           (*)

           "mockConstraints":
               MockTxConstraints
                   { mockTxCostFunction = MockTxCostFunction
                   { baseCost = Coin 643
                   , sizeCost = Coin 2
                   }
                   , mockTxBaseSize = 835
                   , mockTxInputSize = 3
                   , mockTxOutputMaximumSize = 400
                   , mockTxOutputMaximumTokenQuantity = 1077
                   , mockTxOutputMinimumAdaQuantity =
                   MockTxOutputMinimumAdaQuantity
                      { perOutput = Coin 6
                      , perOutputAsset = Coin 1
                      }
                   , mockTxMaximumSize = 2447
                   }

           "feeExcessBefore":
               Coin 5335

           "feeExcessAfter":
               Coin 5536

           "feeExcessDifference":
               Coin 1

           Condition violated: "feeExcessAfter <= feeExcessBefore"
```
The `verify` function is part of the public interface and documentation,
but ordinary users of this module are not generally expected to call it
directly (unless they're writing tests).

In response to review feedback:

#2618 (comment)
This makes the intended behaviour (of repeatedly extending a selection
until its full) a bit clearer.

In response to review feedback:

#2618 (comment)
These functions both assume that one witness is required per input.

In response to review feedback:

#2618 (comment)
This commit makes the order consistent between:

- The order in which functions are listed in the export list
- The order in which functions are actually defined.
It's really important that both of these functions minimize the fee.

So we check that:

1. Calling `balance` on a resultant selection leaves it unchanged.
2. Calling `minimizeFee` on a resultant fee excess leaves it unchanged.

In response to review feedback:

#2618 (comment)
Copy link
Member

@Anviking Anviking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Before this commit, to check the correctness of `totalFee`, we simply
summed the `fee` field values of the individual selections, and checked
this value matched the total.

Now, we manually compute the expected total fee by finding the
difference between:

    1. the total amount of ada provided by the inputs and reward balance;
    2. the total amount of ada spent by the outputs.

This is a stronger test, as we avoid relying on the `Selection` module's
guarantee that the fee of each selection is correct.
@jonathanknowles
Copy link
Member Author

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Apr 22, 2021

Build succeeded:

@iohk-bors iohk-bors bot merged commit 99a6ff9 into master Apr 22, 2021
@iohk-bors iohk-bors bot deleted the jonathanknowles/multi-asset-migration branch April 22, 2021 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants